Impact of Regional Nutrition on Country's COVID-19 Recovery Rate

(OHHH NUTS!)

Combining Three Data Sets

1) World Health Organization - "Life Expectancy" of 2015 Source: Kaggle

2) Global Health Security Index - "Building Collective Action & Accountability" Index developed with The Economist Intelligence Unit for 2019 Source: Johns Hopkins Bloomberg School of Public Health Co-Funded by Bill & Melinda Gates Foundation, Robertson Foundation, & Open Philanthropy Project

3) COVID Global Data Set Source: Kaggle "Nutrition/Diet During COVID-19" This dataset shows the percentages of fat consumed from each type of food listed. The end of the dataset also includes obesity and undernourished percentage, and the percentage of COVID-19 Confirmed/Deaths/Recovered/Active cases. (Note: All the data have unit % except Population, which is just the population count). https://www.kaggle.com/mariaren/covid19-healthy-diet-dataset/data https://storage.googleapis.com/kagglesdsdata/datasets/618335/1175348/Fat_Supply_Quantity_Data.csv?GoogleAccessId=web-data@kaggle-161607.iam.gserviceaccount.com&Expires=1590357735&Signature=bWQ%2Fj3yRSui127xJMXfJ%2BtW6L0L5xx68ngVOg8U3ewhDulVfTGGUSSVyTSQmO5%2FPnt0%2FAXI7zhGfc6Keejv7bDqaK5mB6pZX4YzqHWPfiBo4s8Oa20eVOXlr2fMFdAZSiOLa6d2fpcia9QfebPZflAH4SgGEO5nuqbJLj%2FDOF0Mv1VHkRRp2BZ8dWsogc2jqxDIqISkOnFxasYodcbzL0lKKzPcfF7QJy%2ByzZj3osBjQ0OXCLwjiAT2DdFdwe%2FQ7Bkthv8fNAW8w9uXExH79t%2BL4QV%2F%2F10BKldwlht3mvzg18ZQYIvrmMtA%2BVLsNYBcvSIH5p%2FJCGN1v6fKnTNZXsw%3D%3D&response-content-disposition=attachment%3B+filename%3DFat_Supply_Quantity_Data.csv

Health Focus Due to Pandemic Observations

Healthcare Access

~#1 Key variables to measure healthcare experiencing shortages--in both low and middle-income countries: Ventilator and critical care beds.

~#2 May 6: Iran reached 100,00 COVID-19 cases and still ranked in top 5 globally for over 1 month. Source: https://twitter.com/WHOEMRO/status/1258096811786592256/photo/1

Regional Nutrition

~Observed COVID-19 impact on healthcare workers in my family follow a traditional Eastern Mediterranean diet that includes: Citrus fruits, Almonds, Nigella seeds on cheese or bread, Pistachios, and Pine nuts. Source: https://anjomanfood.com/nuts-in-the-middle-eastern-and-mediterranean-diet/

~Noticed Iran, Saudi Arabia, UAE (travel hub), Egypt, and Pakistan experienced cases with varying medical responses. Eastern Mediterranean region: Afghanistan, Bahrain, Egypt, Iran, Iraq, Jordan, Kuwait, Lebanon, Libya, Morocco, Pakistan, Palestine, Saudi Arabia, Syria, Tunisia, UAE, and Yemen.

~Tracked COVID-19 incidences across WHO maps.

~Excited to test my physican father's theory (and personal diet) out since we were all infected in January-February.

In [3]:
df
Out[3]:
Country Alcoholic Beverages Animal Products Animal fats Aquatic Products, Other Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat ... Vegetable Oils Vegetables Obesity Undernourished Confirmed Deaths Recovered Active Population Unit (all except Population)
0 Afghanistan 0.0000 21.6397 6.2224 0.0 8.0353 0.6859 0.0327 0.4246 6.1244 ... 17.0831 0.3593 4.5 29.8 0.021411 0.000492 0.002445 0.018474 38042000.0 %
1 Albania 0.0000 32.0002 3.4172 0.0 2.6734 1.6448 0.1445 0.6418 8.7428 ... 9.2443 0.6503 22.3 6.2 0.033730 0.001085 0.026522 0.006123 2858000.0 %
2 Algeria 0.0000 14.4175 0.8972 0.0 4.2035 1.2171 0.2008 0.5772 3.8961 ... 27.3606 0.5145 26.6 3.9 0.017375 0.001309 0.009142 0.006925 43406000.0 %
3 Angola 0.0000 15.3041 1.3130 0.0 6.5545 0.1539 1.4155 0.3488 11.0268 ... 22.4638 0.1231 6.8 25 0.000165 0.000010 0.000054 0.000102 31427000.0 %
4 Antigua and Barbuda 0.0000 27.7033 4.6686 0.0 3.2153 0.3872 1.5263 1.2177 14.3202 ... 14.4436 0.2469 19.1 NaN 0.025773 0.003093 0.019588 0.003093 97000.0 %
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
165 Venezuela (Bolivarian Republic of) 0.0000 16.3261 2.2673 0.0 2.5449 0.6555 0.5707 0.9640 7.0949 ... 29.5211 0.1851 25.2 21.2 0.002890 0.000035 0.000919 0.001936 28516000.0 %
166 Vietnam 0.0000 33.2484 3.8238 0.0 3.7155 0.7839 1.1217 0.4079 26.4292 ... 5.6211 0.6373 2.1 9.3 0.000339 0.000000 0.000275 0.000064 95656000.0 %
167 Yemen 0.0000 12.5401 2.0131 0.0 11.5271 0.5514 0.3847 0.2564 8.0010 ... 23.6312 0.1667 14.1 38.9 0.000631 0.000103 0.000017 0.000511 29162000.0 %
168 Zambia 0.0783 9.6005 1.6113 0.0 14.3225 0.6266 1.0070 0.1343 4.9010 ... 15.2848 0.1567 6.5 46.7 0.004658 0.000039 0.001103 0.003516 17861000.0 %
169 Zimbabwe 0.0000 10.3796 2.9543 0.0 9.7922 0.3682 0.2455 0.0614 4.5674 ... 26.9396 0.0789 12.3 51.3 0.000328 0.000027 0.000123 0.000178 14645000.0 %

170 rows × 32 columns

In [4]:
df.columns
Out[4]:
Index(['Country', 'Alcoholic Beverages', 'Animal Products', 'Animal fats',
       'Aquatic Products, Other', 'Cereals - Excluding Beer', 'Eggs',
       'Fish, Seafood', 'Fruits - Excluding Wine', 'Meat', 'Miscellaneous',
       'Milk - Excluding Butter', 'Offals', 'Oilcrops', 'Pulses', 'Spices',
       'Starchy Roots', 'Stimulants', 'Sugar Crops', 'Sugar & Sweeteners',
       'Treenuts', 'Vegetal Products', 'Vegetable Oils', 'Vegetables',
       'Obesity', 'Undernourished', 'Confirmed', 'Deaths', 'Recovered',
       'Active', 'Population', 'Unit (all except Population)'],
      dtype='object')

World Health Organization Life Expectancy Data Set

In [5]:
postgres_user = 'dsbc_student'
postgres_pw = '7*.8G9QH21'
postgres_host = '142.93.121.174'
postgres_port = '5432'
postgres_db = 'lifeexpectancy'

engine = create_engine('postgresql://{}:{}@{}:{}/{}'.format(
    postgres_user, postgres_pw, postgres_host, postgres_port, postgres_db))

life_df = pd.read_sql_query('select * from lifeexpectancy',con=engine)

engine.dispose()
In [6]:
life_df
Out[6]:
Country Year Status Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles ... Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling
0 Afghanistan 2015 Developing 65.0 263.0 62 0.01 71.279624 65.0 1154 ... 6.0 8.16 65.0 0.1 584.259210 33736494.0 17.2 17.3 0.479 10.1
1 Afghanistan 2014 Developing 59.9 271.0 64 0.01 73.523582 62.0 492 ... 58.0 8.18 62.0 0.1 612.696514 327582.0 17.5 17.5 0.476 10.0
2 Afghanistan 2013 Developing 59.9 268.0 66 0.01 73.219243 64.0 430 ... 62.0 8.13 64.0 0.1 631.744976 31731688.0 17.7 17.7 0.470 9.9
3 Afghanistan 2012 Developing 59.5 272.0 69 0.01 78.184215 67.0 2787 ... 67.0 8.52 67.0 0.1 669.959000 3696958.0 17.9 18.0 0.463 9.8
4 Afghanistan 2011 Developing 59.2 275.0 71 0.01 7.097109 68.0 3013 ... 68.0 7.87 68.0 0.1 63.537231 2978599.0 18.2 18.2 0.454 9.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2933 Zimbabwe 2004 Developing 44.3 723.0 27 4.36 0.000000 68.0 31 ... 67.0 7.13 65.0 33.6 454.366654 12777511.0 9.4 9.4 0.407 9.2
2934 Zimbabwe 2003 Developing 44.5 715.0 26 4.06 0.000000 7.0 998 ... 7.0 6.52 68.0 36.7 453.351155 12633897.0 9.8 9.9 0.418 9.5
2935 Zimbabwe 2002 Developing 44.8 73.0 25 4.43 0.000000 73.0 304 ... 73.0 6.53 71.0 39.8 57.348340 125525.0 1.2 1.3 0.427 10.0
2936 Zimbabwe 2001 Developing 45.3 686.0 25 1.72 0.000000 76.0 529 ... 76.0 6.16 75.0 42.1 548.587312 12366165.0 1.6 1.7 0.427 9.8
2937 Zimbabwe 2000 Developing 46.0 665.0 24 1.68 0.000000 79.0 1483 ... 78.0 7.10 78.0 43.5 547.358878 12222251.0 11.0 11.2 0.434 9.8

2938 rows × 22 columns

In [7]:
#df.loc[df['shield'] > 6]
life_df.loc[life_df['Life expectancy '] > 70]
Out[7]:
Country Year Status Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles ... Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling
16 Albania 2015 Developing 77.8 74.0 0 4.60 364.975229 99.0 0 ... 99.0 6.00 99.0 0.1 3954.227830 28873.0 1.2 1.3 0.762 14.2
17 Albania 2014 Developing 77.5 8.0 0 4.51 428.749067 98.0 0 ... 98.0 5.88 98.0 0.1 4575.763787 288914.0 1.2 1.3 0.761 14.2
18 Albania 2013 Developing 77.2 84.0 0 4.76 430.876979 99.0 0 ... 99.0 5.66 99.0 0.1 4414.723140 289592.0 1.3 1.4 0.759 14.2
19 Albania 2012 Developing 76.9 86.0 0 5.14 412.443356 99.0 9 ... 99.0 5.59 99.0 0.1 4247.614380 2941.0 1.3 1.4 0.752 14.2
20 Albania 2011 Developing 76.6 88.0 0 5.37 437.062100 99.0 28 ... 99.0 5.71 99.0 0.1 4437.178680 295195.0 1.4 1.5 0.738 13.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2885 Viet Nam 2004 Developing 74.2 136.0 29 2.86 0.000000 94.0 217 ... 96.0 5.90 96.0 0.2 NaN NaN 15.4 16.1 0.601 11.0
2886 Viet Nam 2003 Developing 74.0 137.0 30 2.19 0.000000 78.0 2297 ... 96.0 4.84 99.0 0.2 NaN NaN 15.6 16.2 0.592 10.9
2887 Viet Nam 2002 Developing 73.8 137.0 30 2.03 0.000000 NaN 6755 ... 92.0 4.70 75.0 0.2 NaN NaN 15.6 16.3 0.584 10.7
2888 Viet Nam 2001 Developing 73.6 138.0 32 1.84 0.000000 NaN 12058 ... 96.0 5.17 96.0 0.1 NaN NaN 15.7 16.4 0.576 10.6
2889 Viet Nam 2000 Developing 73.4 139.0 33 1.60 0.000000 NaN 16512 ... 96.0 4.89 96.0 0.1 NaN NaN 15.8 16.4 0.569 10.4

1620 rows × 22 columns

In [9]:
life_df.columns
Out[9]:
Index(['Country', 'Year', 'Status', 'Life expectancy ', 'Adult Mortality',
       'infant deaths', 'Alcohol', 'percentage expenditure', 'Hepatitis B',
       'Measles ', ' BMI ', 'under-five deaths ', 'Polio', 'Total expenditure',
       'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
       ' thinness  1-19 years', ' thinness 5-9 years',
       'Income composition of resources', 'Schooling'],
      dtype='object')

Exploratory Data Analysis

1) Join data sets df and life_df to be covid_df. Perform any data cleaning. 2) Explore data to discover relationships and features. We will explore the data using both statistics and data visualization. 3) Perform feature engineering by reviewing the most important variables by conducting 'feature importance' and transforming into features to predict, classify, or measure our target variable.

Exploratory Note:

Both data sets include countries as columns, which will serve as the key. However, the 'life_df' set includes multiple years for each country, so we will select the most recent year for each country to join as the key when we append the data sets.

In [12]:
life_df2 = life_df['Year'].max()
life_df2
Out[12]:
2015
In [13]:
life_df.nunique()
Out[13]:
Country                             193
Year                                 16
Status                                2
Life expectancy                     362
Adult Mortality                     425
infant deaths                       209
Alcohol                            1076
percentage expenditure             2328
Hepatitis B                          87
Measles                             958
 BMI                                608
under-five deaths                   252
Polio                                73
Total expenditure                   818
Diphtheria                           81
 HIV/AIDS                           200
GDP                                2490
Population                         2278
 thinness  1-19 years               200
 thinness 5-9 years                 207
Income composition of resources     625
Schooling                           173
dtype: int64
In [14]:
df.nunique()
Out[14]:
Country                         170
Alcoholic Beverages               3
Animal Products                 170
Animal fats                     169
Aquatic Products, Other           6
Cereals - Excluding Beer        170
Eggs                            169
Fish, Seafood                   170
Fruits - Excluding Wine         168
Meat                            170
Miscellaneous                   137
Milk - Excluding Butter         169
Offals                          167
Oilcrops                        170
Pulses                          160
Spices                          155
Starchy Roots                   166
Stimulants                      169
Sugar Crops                      11
Sugar & Sweeteners                9
Treenuts                        162
Vegetal Products                170
Vegetable Oils                  170
Vegetables                      168
Obesity                         120
Undernourished                   98
Confirmed                       161
Deaths                          145
Recovered                       161
Active                          155
Population                      170
Unit (all except Population)      1
dtype: int64
In [16]:
df.columns
Out[16]:
Index(['Country', 'Alcoholic Beverages', 'Animal Products', 'Animal fats',
       'Aquatic Products, Other', 'Cereals - Excluding Beer', 'Eggs',
       'Fish, Seafood', 'Fruits - Excluding Wine', 'Meat', 'Miscellaneous',
       'Milk - Excluding Butter', 'Offals', 'Oilcrops', 'Pulses', 'Spices',
       'Starchy Roots', 'Stimulants', 'Sugar Crops', 'Sugar & Sweeteners',
       'Treenuts', 'Vegetal Products', 'Vegetable Oils', 'Vegetables',
       'Obesity', 'Undernourished', 'Confirmed', 'Deaths', 'Recovered',
       'Active', 'Population', 'Unit (all except Population)'],
      dtype='object')
In [17]:
life_df.columns
Out[17]:
Index(['Country', 'Year', 'Status', 'Life expectancy ', 'Adult Mortality',
       'infant deaths', 'Alcohol', 'percentage expenditure', 'Hepatitis B',
       'Measles ', ' BMI ', 'under-five deaths ', 'Polio', 'Total expenditure',
       'Diphtheria ', ' HIV/AIDS', 'GDP', 'Population',
       ' thinness  1-19 years', ' thinness 5-9 years',
       'Income composition of resources', 'Schooling'],
      dtype='object')

Exploratory Note: We will drop less used from first data frame:

Alcoholic Beverages, Sugar Crops, Sugar & Sweeteners ,Undernourished, Miscellaneous,Aquatic Products, Other,Undernourished,

In [18]:
#Drop less used from first data frame: Alcoholic Beverages, Sugar Crops, Sugar & Sweeteners ,Undernourished, Miscellaneous,Aquatic Products, Other,Undernourished,  
df2 = df.drop(['Alcoholic Beverages','Aquatic Products, Other','Sugar & Sweeteners','Sugar Crops','Miscellaneous','Undernourished','Aquatic Products, Other',], axis=1)
df2
Out[18]:
Country Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals ... Vegetal Products Vegetable Oils Vegetables Obesity Confirmed Deaths Recovered Active Population Unit (all except Population)
0 Afghanistan 21.6397 6.2224 8.0353 0.6859 0.0327 0.4246 6.1244 8.2803 0.3103 ... 28.3684 17.0831 0.3593 4.5 0.021411 0.000492 0.002445 0.018474 38042000.0 %
1 Albania 32.0002 3.4172 2.6734 1.6448 0.1445 0.6418 8.7428 17.7576 0.2933 ... 17.9998 9.2443 0.6503 22.3 0.033730 0.001085 0.026522 0.006123 2858000.0 %
2 Algeria 14.4175 0.8972 4.2035 1.2171 0.2008 0.5772 3.8961 8.0934 0.1067 ... 35.5857 27.3606 0.5145 26.6 0.017375 0.001309 0.009142 0.006925 43406000.0 %
3 Angola 15.3041 1.3130 6.5545 0.1539 1.4155 0.3488 11.0268 1.2309 0.1539 ... 34.7010 22.4638 0.1231 6.8 0.000165 0.000010 0.000054 0.000102 31427000.0 %
4 Antigua and Barbuda 27.7033 4.6686 3.2153 0.3872 1.5263 1.2177 14.3202 6.6607 0.1347 ... 22.2995 14.4436 0.2469 19.1 0.025773 0.003093 0.019588 0.003093 97000.0 %
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
165 Venezuela (Bolivarian Republic of) 16.3261 2.2673 2.5449 0.6555 0.5707 0.9640 7.0949 5.5217 0.2082 ... 33.6855 29.5211 0.1851 25.2 0.002890 0.000035 0.000919 0.001936 28516000.0 %
166 Vietnam 33.2484 3.8238 3.7155 0.7839 1.1217 0.4079 26.4292 0.7520 0.3378 ... 16.7548 5.6211 0.6373 2.1 0.000339 0.000000 0.000275 0.000064 95656000.0 %
167 Yemen 12.5401 2.0131 11.5271 0.5514 0.3847 0.2564 8.0010 1.3463 0.2436 ... 37.4535 23.6312 0.1667 14.1 0.000631 0.000103 0.000017 0.000511 29162000.0 %
168 Zambia 9.6005 1.6113 14.3225 0.6266 1.0070 0.1343 4.9010 1.2756 0.1790 ... 40.3939 15.2848 0.1567 6.5 0.004658 0.000039 0.001103 0.003516 17861000.0 %
169 Zimbabwe 10.3796 2.9543 9.7922 0.3682 0.2455 0.0614 4.5674 2.1040 0.1315 ... 39.6248 26.9396 0.0789 12.3 0.000328 0.000027 0.000123 0.000178 14645000.0 %

170 rows × 26 columns

In [19]:
#Drop less used from second data frame, Life data frame:
life_df2 = life_df.drop(['Measles ','Hepatitis B','infant deaths',' thinness  1-19 years','Alcohol', ' thinness 5-9 years','GDP','Total expenditure'], axis=1)
life_df2
Out[19]:
Country Year Status Life expectancy Adult Mortality percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population Income composition of resources Schooling
0 Afghanistan 2015 Developing 65.0 263.0 71.279624 19.1 83 6.0 65.0 0.1 33736494.0 0.479 10.1
1 Afghanistan 2014 Developing 59.9 271.0 73.523582 18.6 86 58.0 62.0 0.1 327582.0 0.476 10.0
2 Afghanistan 2013 Developing 59.9 268.0 73.219243 18.1 89 62.0 64.0 0.1 31731688.0 0.470 9.9
3 Afghanistan 2012 Developing 59.5 272.0 78.184215 17.6 93 67.0 67.0 0.1 3696958.0 0.463 9.8
4 Afghanistan 2011 Developing 59.2 275.0 7.097109 17.2 97 68.0 68.0 0.1 2978599.0 0.454 9.5
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2933 Zimbabwe 2004 Developing 44.3 723.0 0.000000 27.1 42 67.0 65.0 33.6 12777511.0 0.407 9.2
2934 Zimbabwe 2003 Developing 44.5 715.0 0.000000 26.7 41 7.0 68.0 36.7 12633897.0 0.418 9.5
2935 Zimbabwe 2002 Developing 44.8 73.0 0.000000 26.3 40 73.0 71.0 39.8 125525.0 0.427 10.0
2936 Zimbabwe 2001 Developing 45.3 686.0 0.000000 25.9 39 76.0 75.0 42.1 12366165.0 0.427 9.8
2937 Zimbabwe 2000 Developing 46.0 665.0 0.000000 25.5 39 78.0 78.0 43.5 12222251.0 0.434 9.8

2938 rows × 14 columns

In [20]:
life_df.tail(140)
Out[20]:
Country Year Status Life expectancy Adult Mortality infant deaths Alcohol percentage expenditure Hepatitis B Measles ... Polio Total expenditure Diphtheria HIV/AIDS GDP Population thinness 1-19 years thinness 5-9 years Income composition of resources Schooling
2798 United States of America 2011 Developed 78.7 16.0 25 8.67 0.0 91.0 220 ... 94.0 17.60 96.0 0.1 NaN NaN 0.7 0.6 NaN NaN
2799 United States of America 2010 Developed 78.7 15.0 25 8.55 0.0 92.0 63 ... 93.0 17.20 95.0 0.1 NaN NaN 0.7 0.6 NaN NaN
2800 United States of America 2009 Developed 78.5 18.0 26 8.71 0.0 92.0 71 ... 93.0 17.00 95.0 0.1 NaN NaN 0.7 0.6 NaN NaN
2801 United States of America 2008 Developed 78.2 18.0 27 8.74 0.0 94.0 140 ... 94.0 16.20 96.0 0.1 NaN NaN 0.7 0.6 NaN NaN
2802 United States of America 2007 Developed 78.1 11.0 27 8.74 0.0 93.0 43 ... 93.0 15.57 96.0 0.1 NaN NaN 0.7 0.6 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2933 Zimbabwe 2004 Developing 44.3 723.0 27 4.36 0.0 68.0 31 ... 67.0 7.13 65.0 33.6 454.366654 12777511.0 9.4 9.4 0.407 9.2
2934 Zimbabwe 2003 Developing 44.5 715.0 26 4.06 0.0 7.0 998 ... 7.0 6.52 68.0 36.7 453.351155 12633897.0 9.8 9.9 0.418 9.5
2935 Zimbabwe 2002 Developing 44.8 73.0 25 4.43 0.0 73.0 304 ... 73.0 6.53 71.0 39.8 57.348340 125525.0 1.2 1.3 0.427 10.0
2936 Zimbabwe 2001 Developing 45.3 686.0 25 1.72 0.0 76.0 529 ... 76.0 6.16 75.0 42.1 548.587312 12366165.0 1.6 1.7 0.427 9.8
2937 Zimbabwe 2000 Developing 46.0 665.0 24 1.68 0.0 79.0 1483 ... 78.0 7.10 78.0 43.5 547.358878 12222251.0 11.0 11.2 0.434 9.8

140 rows × 22 columns

Join datasets by key 'Country'

We will join both dataframes and select only Year == 2015 from the life_df2 dataframe to create a new dataframe: covid_df.

Univariate Analysis

In [23]:
covid_df.corr()
Out[23]:
Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals Oilcrops ... Adult Mortality percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population_y Income composition of resources Schooling
Animal Products 1.000000 0.696317 -0.464931 0.470419 -0.021730 -0.112444 0.736699 0.634589 0.065823 -0.436742 ... -0.401995 0.113623 0.427160 -0.204313 0.385766 0.286291 -0.374073 -0.175792 0.689126 0.616102
Animal fats 0.696317 1.000000 -0.407189 0.281349 -0.118791 -0.168326 0.231632 0.342875 -0.179648 -0.340163 ... -0.353027 -0.008614 0.369455 -0.039779 0.279839 0.239359 -0.313355 -0.083317 0.594815 0.561922
Cereals - Excluding Beer -0.464931 -0.407189 1.000000 -0.301811 -0.041983 0.016833 -0.278022 -0.276145 0.281311 0.113556 ... 0.400019 -0.025796 -0.429014 0.174882 -0.423820 -0.275711 0.511248 0.063692 -0.629482 -0.569581
Eggs 0.470419 0.281349 -0.301811 1.000000 0.206953 -0.061869 0.242745 0.273953 -0.122609 -0.342333 ... -0.369550 0.076464 0.230078 -0.109621 0.259077 0.208234 -0.381427 0.056293 0.562982 0.457319
Fish, Seafood -0.021730 -0.118791 -0.041983 0.206953 1.000000 0.025822 0.015310 -0.272491 -0.091862 0.346553 ... -0.044788 -0.071822 -0.130329 -0.063458 -0.013412 -0.044516 -0.075533 0.033109 0.003294 -0.006498
Fruits - Excluding Wine -0.112444 -0.168326 0.016833 -0.061869 0.025822 1.000000 -0.026609 -0.062986 0.075238 0.050115 ... 0.027664 0.006149 -0.068760 -0.039160 0.034929 -0.028214 -0.047952 0.022113 -0.096971 -0.061888
Meat 0.736699 0.231632 -0.278022 0.242745 0.015310 -0.026609 1.000000 0.150011 0.229228 -0.241335 ... -0.146436 -0.019060 0.258219 -0.261477 0.223089 0.099342 -0.146077 -0.189613 0.384123 0.345947
Milk - Excluding Butter 0.634589 0.342875 -0.276145 0.273953 -0.272491 -0.062986 0.150011 1.000000 0.050456 -0.416321 ... -0.340831 0.307515 0.312060 -0.064060 0.307814 0.297828 -0.302312 -0.118948 0.439168 0.371274
Offals 0.065823 -0.179648 0.281311 -0.122609 -0.091862 0.075238 0.229228 0.050456 1.000000 -0.012253 ... 0.306247 0.120770 -0.160281 -0.009059 -0.184620 -0.339397 0.283079 -0.036222 -0.261051 -0.244990
Oilcrops -0.436742 -0.340163 0.113556 -0.342333 0.346553 0.050115 -0.241335 -0.416321 -0.012253 1.000000 ... 0.132772 -0.010549 -0.212996 0.040473 -0.244861 -0.186438 0.110726 -0.017289 -0.377121 -0.349467
Pulses -0.425094 -0.315595 0.406925 -0.331933 -0.093786 0.483089 -0.318487 -0.190892 0.056014 0.149362 ... 0.288155 -0.035522 -0.351309 0.288831 -0.110936 -0.118573 0.185870 -0.003634 -0.511181 -0.472831
Spices -0.183738 -0.200584 0.123360 -0.002724 0.226539 0.011036 -0.162154 -0.084517 -0.090423 0.110913 ... 0.058863 -0.048218 -0.165202 0.138913 0.015188 0.110768 -0.040758 0.025912 -0.136138 -0.138357
Starchy Roots -0.389973 -0.303520 0.210341 -0.351431 0.169265 0.440201 -0.170990 -0.387129 0.132172 0.298163 ... 0.420362 -0.045116 -0.349777 0.145597 -0.211763 -0.295134 0.302615 0.121661 -0.448664 -0.341699
Stimulants 0.507705 0.280966 -0.264135 0.278077 0.005564 -0.089096 0.306149 0.474636 -0.027747 -0.286485 ... -0.299139 -0.022776 0.246330 -0.205219 0.263951 0.226711 -0.253963 -0.145858 0.455781 0.380227
Treenuts 0.159286 0.161809 -0.205887 0.292638 0.160767 -0.091491 -0.056260 0.196338 -0.146111 -0.214664 ... -0.304072 0.022282 0.247513 -0.080001 0.193674 0.218086 -0.170558 0.024855 0.301874 0.279961
Vegetal Products -1.000000 -0.696306 0.464931 -0.470465 0.021678 0.112282 -0.736701 -0.634573 -0.065814 0.436730 ... 0.401972 -0.113612 -0.427149 0.204334 -0.385817 -0.286321 0.374038 0.175868 -0.689133 -0.616118
Vegetable Oils -0.662161 -0.369446 0.005906 -0.197051 -0.249428 -0.070143 -0.558249 -0.361765 -0.197600 -0.226120 ... 0.218357 -0.112548 -0.151018 0.133938 -0.125381 -0.111301 0.159511 0.190629 -0.261272 -0.221994
Vegetables 0.083535 -0.083130 0.044259 0.164613 -0.008991 0.032403 0.003719 0.246313 0.081951 -0.131136 ... -0.041244 0.138704 0.037330 0.045392 0.066256 0.063391 -0.160218 0.093412 0.034154 -0.029775
Obesity 0.430380 0.391204 -0.497187 0.309372 -0.158797 -0.092355 0.271295 0.272539 -0.251688 -0.157551 ... -0.467360 0.005017 0.770613 -0.322088 0.304861 0.298984 -0.373918 -0.075649 0.692220 0.633168
Confirmed 0.357898 0.353236 -0.362804 0.291238 0.053155 -0.056920 0.151191 0.238475 -0.171835 -0.247925 ... -0.392181 -0.040137 0.418583 -0.156785 0.261332 0.169787 -0.227340 -0.017993 0.531878 0.471506
Deaths 0.226872 0.329210 -0.295061 0.129442 -0.025360 -0.065152 0.032719 0.155998 -0.173631 -0.178322 ... -0.294966 -0.029003 0.314081 -0.095783 0.170265 0.138092 -0.153854 0.021039 0.396425 0.381687
Recovered 0.338004 0.318047 -0.288803 0.187248 -0.009306 -0.048483 0.178391 0.227385 -0.145488 -0.224722 ... -0.337419 -0.024235 0.360196 -0.128309 0.211297 0.157846 -0.187125 -0.007453 0.472942 0.423891
Active 0.213949 0.214089 -0.278007 0.301388 0.121780 -0.036138 0.053351 0.138807 -0.114165 -0.157233 ... -0.270716 -0.041738 0.288604 -0.120871 0.203096 0.099637 -0.171249 -0.031949 0.343856 0.292468
Population_x 0.000845 0.019125 -0.005294 0.141391 -0.012514 -0.043501 0.006982 -0.053587 0.083154 -0.033425 ... -0.009411 -0.021705 -0.109352 0.687921 0.006104 0.016436 -0.044615 0.118447 -0.031850 -0.041698
Year NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Life expectancy 0.627147 0.538355 -0.553566 0.564881 0.046156 -0.036785 0.312417 0.435685 -0.296445 -0.344197 ... -0.769827 0.044123 0.539841 -0.270825 0.524219 0.455457 -0.611583 -0.064462 0.907585 0.819553
Adult Mortality -0.401995 -0.353027 0.400019 -0.369550 -0.044788 0.027664 -0.146436 -0.340831 0.306247 0.132772 ... 1.000000 -0.043049 -0.394249 0.209851 -0.402464 -0.286336 0.664830 0.057692 -0.635897 -0.532971
percentage expenditure 0.113623 -0.008614 -0.025796 0.076464 -0.071822 0.006149 -0.019060 0.307515 0.120770 -0.010549 ... -0.043049 1.000000 0.037973 -0.015977 -0.000456 0.035585 -0.038246 -0.020951 0.011441 0.014842
BMI 0.427160 0.369455 -0.429014 0.230078 -0.130329 -0.068760 0.258219 0.312060 -0.160281 -0.212996 ... -0.394249 0.037973 1.000000 -0.236015 0.288257 0.217996 -0.299818 -0.006142 0.617654 0.592342
under-five deaths -0.204313 -0.039779 0.174882 -0.109621 -0.063458 -0.039160 -0.261477 -0.064060 -0.009059 0.040473 ... 0.209851 -0.015977 -0.236015 1.000000 -0.164226 -0.153950 0.134492 0.308069 -0.247259 -0.250599
Polio 0.385766 0.279839 -0.423820 0.259077 -0.013412 0.034929 0.223089 0.307814 -0.184620 -0.244861 ... -0.402464 -0.000456 0.288257 -0.164226 1.000000 0.646003 -0.433104 -0.267904 0.494363 0.409904
Diphtheria 0.286291 0.239359 -0.275711 0.208234 -0.044516 -0.028214 0.099342 0.297828 -0.339397 -0.186438 ... -0.286336 0.035585 0.217996 -0.153950 0.646003 1.000000 -0.341930 -0.081235 0.425759 0.385533
HIV/AIDS -0.374073 -0.313355 0.511248 -0.381427 -0.075533 -0.047952 -0.146077 -0.302312 0.283079 0.110726 ... 0.664830 -0.038246 -0.299818 0.134492 -0.433104 -0.341930 1.000000 0.044697 -0.492405 -0.393279
Population_y -0.175792 -0.083317 0.063692 0.056293 0.033109 0.022113 -0.189613 -0.118948 -0.036222 -0.017289 ... 0.057692 -0.020951 -0.006142 0.308069 -0.267904 -0.081235 0.044697 1.000000 -0.004053 0.012996
Income composition of resources 0.689126 0.594815 -0.629482 0.562982 0.003294 -0.096971 0.384123 0.439168 -0.261051 -0.377121 ... -0.635897 0.011441 0.617654 -0.247259 0.494363 0.425759 -0.492405 -0.004053 1.000000 0.926394
Schooling 0.616102 0.561922 -0.569581 0.457319 -0.006498 -0.061888 0.345947 0.371274 -0.244990 -0.349467 ... -0.532971 0.014842 0.592342 -0.250599 0.409904 0.385533 -0.393279 0.012996 0.926394 1.000000

36 rows × 36 columns

In [24]:
ProfileReport(covid_df)
Out[24]:

Overview

Dataset info

Number of variables 39
Number of observations 157
Total Missing (%) 0.8%
Total size in memory 49.1 KiB
Average record size in memory 320.0 B

Variables types

Numeric 33
Categorical 1
Boolean 0
Date 0
Text (Unique) 1
Rejected 4
Unsupported 0

Warnings

Variables

Country
Categorical, Unique

First 3 values
Argentina
Peru
Cyprus
Last 3 values
Gabon
France
Djibouti

First 10 values

Value Count Frequency (%)  
Afghanistan 1 0.6%
 
Albania 1 0.6%
 
Algeria 1 0.6%
 
Angola 1 0.6%
 
Antigua and Barbuda 1 0.6%
 

Last 10 values

Value Count Frequency (%)  
Vanuatu 1 0.6%
 
Venezuela (Bolivarian Republic of) 1 0.6%
 
Yemen 1 0.6%
 
Zambia 1 0.6%
 
Zimbabwe 1 0.6%
 

Animal Products
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 20.595
Minimum 5.0182
Maximum 36.902
Zeros (%) 0.0%

Quantile statistics

Minimum 5.0182
5-th percentile 7.0546
Q1 14.418
Median 20.928
Q3 26.91
95-th percentile 32.835
Maximum 36.902
Range 31.884
Interquartile range 12.492

Descriptive statistics

Standard deviation 8.0638
Coef of variation 0.39154
Kurtosis -0.92571
Mean 20.595
MAD 6.8101
Skewness -0.066656
Sum 3233.5
Variance 65.026
Memory size 7.5 KiB
Value Count Frequency (%)  
10.8323 1 0.6%
 
25.9903 1 0.6%
 
28.4111 1 0.6%
 
26.6163 1 0.6%
 
17.4631 1 0.6%
 
32.6886 1 0.6%
 
19.0932 1 0.6%
 
25.8451 1 0.6%
 
26.7378 1 0.6%
 
20.5571 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
5.0182 1 0.6%
 
5.3063 1 0.6%
 
5.9931 1 0.6%
 
6.0418 1 0.6%
 
6.0747 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
34.1264 1 0.6%
 
34.4402 1 0.6%
 
35.4131 1 0.6%
 
36.725 1 0.6%
 
36.9018 1 0.6%
 

Animal fats
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.1841
Minimum 0.0348
Maximum 14.937
Zeros (%) 0.0%

Quantile statistics

Minimum 0.0348
5-th percentile 0.53388
Q1 1.6113
Median 3.3013
Q3 6.3787
95-th percentile 10.556
Maximum 14.937
Range 14.902
Interquartile range 4.7674

Descriptive statistics

Standard deviation 3.379
Coef of variation 0.80759
Kurtosis 0.64579
Mean 4.1841
MAD 2.7235
Skewness 1.0943
Sum 656.9
Variance 11.418
Memory size 7.5 KiB
Value Count Frequency (%)  
3.3076 2 1.3%
 
1.8698 1 0.6%
 
0.6241 1 0.6%
 
2.9392 1 0.6%
 
6.2224 1 0.6%
 
6.1 1 0.6%
 
0.4154 1 0.6%
 
9.8102 1 0.6%
 
3.0066 1 0.6%
 
3.4472 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0348 1 0.6%
 
0.1678 1 0.6%
 
0.2548 1 0.6%
 
0.3195 1 0.6%
 
0.33899999999999997 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
12.6234 1 0.6%
 
12.8517 1 0.6%
 
13.9753 1 0.6%
 
14.2498 1 0.6%
 
14.9373 1 0.6%
 

Cereals - Excluding Beer
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.4243
Minimum 0.9908
Maximum 18.376
Zeros (%) 0.0%

Quantile statistics

Minimum 0.9908
5-th percentile 1.2728
Q1 2.0428
Median 3.3592
Q3 5.6789
95-th percentile 10.72
Maximum 18.376
Range 17.386
Interquartile range 3.6361

Descriptive statistics

Standard deviation 3.1996
Coef of variation 0.72317
Kurtosis 2.433
Mean 4.4243
MAD 2.4375
Skewness 1.5249
Sum 694.62
Variance 10.237
Memory size 7.5 KiB
Value Count Frequency (%)  
4.9807 1 0.6%
 
4.3092 1 0.6%
 
13.0891 1 0.6%
 
1.078 1 0.6%
 
1.4883 1 0.6%
 
2.0003 1 0.6%
 
3.4479 1 0.6%
 
7.1664 1 0.6%
 
7.986000000000001 1 0.6%
 
3.8355 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.9908 1 0.6%
 
1.078 1 0.6%
 
1.1241 1 0.6%
 
1.1278 1 0.6%
 
1.1778 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
12.436 1 0.6%
 
13.0891 1 0.6%
 
13.4988 1 0.6%
 
14.3225 1 0.6%
 
18.3763 1 0.6%
 

Eggs
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.95439
Minimum 0.058
Maximum 3.2756
Zeros (%) 0.0%

Quantile statistics

Minimum 0.058
5-th percentile 0.13722
Q1 0.3682
Median 0.8991
Q3 1.2664
95-th percentile 2.2859
Maximum 3.2756
Range 3.2176
Interquartile range 0.8982

Descriptive statistics

Standard deviation 0.65851
Coef of variation 0.68998
Kurtosis 1.2164
Mean 0.95439
MAD 0.51167
Skewness 1.0108
Sum 149.84
Variance 0.43364
Memory size 7.5 KiB
Value Count Frequency (%)  
0.8991 2 1.3%
 
0.8448 1 0.6%
 
0.32899999999999996 1 0.6%
 
1.3259 1 0.6%
 
2.4481 1 0.6%
 
0.825 1 0.6%
 
1.7484 1 0.6%
 
0.5249 1 0.6%
 
0.6203 1 0.6%
 
1.5706 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.057999999999999996 1 0.6%
 
0.0701 1 0.6%
 
0.0744 1 0.6%
 
0.0746 1 0.6%
 
0.1074 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
2.6186 1 0.6%
 
2.7596 1 0.6%
 
2.8961 1 0.6%
 
3.1241 1 0.6%
 
3.2756 1 0.6%
 

Fish, Seafood
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.84838
Minimum 0.0174
Maximum 8.4068
Zeros (%) 0.0%

Quantile statistics

Minimum 0.0174
5-th percentile 0.09486
Q1 0.3245
Median 0.5708
Q3 1.0457
95-th percentile 2.3237
Maximum 8.4068
Range 8.3894
Interquartile range 0.7212

Descriptive statistics

Standard deviation 0.94856
Coef of variation 1.1181
Kurtosis 26.937
Mean 0.84838
MAD 0.58949
Skewness 4.1429
Sum 133.19
Variance 0.89977
Memory size 7.5 KiB
Value Count Frequency (%)  
0.4515 1 0.6%
 
0.5746 1 0.6%
 
0.1302 1 0.6%
 
3.2666 1 0.6%
 
0.5633 1 0.6%
 
0.4514 1 0.6%
 
0.7451 1 0.6%
 
0.0962 1 0.6%
 
1.169 1 0.6%
 
0.1482 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.0174 1 0.6%
 
0.0315 1 0.6%
 
0.0327 1 0.6%
 
0.0559 1 0.6%
 
0.0587 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
2.7774 1 0.6%
 
3.0833 1 0.6%
 
3.2666 1 0.6%
 
4.8461 1 0.6%
 
8.4068 1 0.6%
 

Fruits - Excluding Wine
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.54967
Minimum 0.0373
Maximum 9.6727
Zeros (%) 0.0%

Quantile statistics

Minimum 0.0373
5-th percentile 0.08512
Q1 0.2388
Median 0.3556
Q3 0.5791
95-th percentile 1.3878
Maximum 9.6727
Range 9.6354
Interquartile range 0.3403

Descriptive statistics

Standard deviation 0.86907
Coef of variation 1.5811
Kurtosis 79.295
Mean 0.54967
MAD 0.37635
Skewness 7.968
Sum 86.298
Variance 0.75529
Memory size 2.5 KiB
Value Count Frequency (%)  
0.2987 2 1.3%
 
1.5030000000000001 1 0.6%
 
1.2516 1 0.6%
 
0.0614 1 0.6%
 
0.7014 1 0.6%
 
1.2151 1 0.6%
 
0.5791 1 0.6%
 
0.1339 1 0.6%
 
0.2938 1 0.6%
 
0.3433 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0373 1 0.6%
 
0.042 1 0.6%
 
0.0443 1 0.6%
 
0.0614 1 0.6%
 
0.0624 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
1.6382 1 0.6%
 
1.6804 1 0.6%
 
2.8436 1 0.6%
 
3.6133 1 0.6%
 
9.6727 1 0.6%
 

Meat
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 9.2145
Minimum 0.9061
Maximum 22.878
Zeros (%) 0.0%

Quantile statistics

Minimum 0.9061
5-th percentile 2.9911
Q1 6.1244
Median 9.0171
Q3 11.558
95-th percentile 17.65
Maximum 22.878
Range 21.972
Interquartile range 5.4337

Descriptive statistics

Standard deviation 4.4521
Coef of variation 0.48317
Kurtosis 0.31683
Mean 9.2145
MAD 3.4562
Skewness 0.62422
Sum 1446.7
Variance 19.822
Memory size 7.5 KiB
Value Count Frequency (%)  
3.3685 1 0.6%
 
11.5581 1 0.6%
 
9.4166 1 0.6%
 
8.6212 1 0.6%
 
9.7764 1 0.6%
 
11.5636 1 0.6%
 
2.4838 1 0.6%
 
9.6514 1 0.6%
 
6.7594 1 0.6%
 
2.8993 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.9061 1 0.6%
 
1.3488 1 0.6%
 
1.8407 1 0.6%
 
2.0269999999999997 1 0.6%
 
2.0412 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
19.2693 1 0.6%
 
20.2172 1 0.6%
 
21.0223 1 0.6%
 
21.6062 1 0.6%
 
22.8778 1 0.6%
 

Milk - Excluding Butter
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 5.2451
Minimum 0.1779
Maximum 17.758
Zeros (%) 0.0%

Quantile statistics

Minimum 0.1779
5-th percentile 0.56352
Q1 2.2937
Median 5.1279
Q3 7.4411
95-th percentile 11.056
Maximum 17.758
Range 17.58
Interquartile range 5.1474

Descriptive statistics

Standard deviation 3.3636
Coef of variation 0.64127
Kurtosis 0.46226
Mean 5.2451
MAD 2.6747
Skewness 0.65741
Sum 823.49
Variance 11.314
Memory size 7.5 KiB
Value Count Frequency (%)  
5.86 2 1.3%
 
5.9146 1 0.6%
 
5.2308 1 0.6%
 
7.4043 1 0.6%
 
10.3934 1 0.6%
 
11.5125 1 0.6%
 
2.2937 1 0.6%
 
4.8443 1 0.6%
 
4.548 1 0.6%
 
8.3355 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.1779 1 0.6%
 
0.2243 1 0.6%
 
0.2438 1 0.6%
 
0.2553 1 0.6%
 
0.4024 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
11.8155 1 0.6%
 
12.5363 1 0.6%
 
14.2068 1 0.6%
 
14.275 1 0.6%
 
17.7576 1 0.6%
 

Offals
Numeric

Distinct count 154
Unique (%) 98.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.14842
Minimum 0
Maximum 0.7268
Zeros (%) 1.3%

Quantile statistics

Minimum 0
5-th percentile 0.01812
Q1 0.0763
Median 0.1221
Q3 0.189
95-th percentile 0.33644
Maximum 0.7268
Range 0.7268
Interquartile range 0.1127

Descriptive statistics

Standard deviation 0.11534
Coef of variation 0.77713
Kurtosis 6.8648
Mean 0.14842
MAD 0.081521
Skewness 2.1107
Sum 23.302
Variance 0.013304
Memory size 2.5 KiB
Value Count Frequency (%)  
0.1563 2 1.3%
 
0.0 2 1.3%
 
0.09699999999999999 2 1.3%
 
0.2145 1 0.6%
 
0.1 1 0.6%
 
0.1178 1 0.6%
 
0.055 1 0.6%
 
0.2175 1 0.6%
 
0.0288 1 0.6%
 
0.1627 1 0.6%
 
Other values (144) 144 91.7%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 2 1.3%
 
0.0033 1 0.6%
 
0.0065 1 0.6%
 
0.0074 1 0.6%
 
0.0078 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.4322 1 0.6%
 
0.4784 1 0.6%
 
0.5957 1 0.6%
 
0.6717 1 0.6%
 
0.7268 1 0.6%
 

Oilcrops
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 3.3572
Minimum 0.064
Maximum 28.564
Zeros (%) 0.0%

Quantile statistics

Minimum 0.064
5-th percentile 0.35396
Q1 0.7686
Median 1.566
Q3 3.4389
95-th percentile 12.664
Maximum 28.564
Range 28.5
Interquartile range 2.6703

Descriptive statistics

Standard deviation 4.8372
Coef of variation 1.4408
Kurtosis 10.419
Mean 3.3572
MAD 3.0245
Skewness 3.0302
Sum 527.08
Variance 23.398
Memory size 7.5 KiB
Value Count Frequency (%)  
0.8931 1 0.6%
 
1.9462 1 0.6%
 
3.3157 1 0.6%
 
1.4978 1 0.6%
 
13.4216 1 0.6%
 
3.0787 1 0.6%
 
3.1993 1 0.6%
 
3.6545 1 0.6%
 
0.8909 1 0.6%
 
0.6488 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.064 1 0.6%
 
0.0895 1 0.6%
 
0.1003 1 0.6%
 
0.1007 1 0.6%
 
0.1259 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
16.8666 1 0.6%
 
20.7704 1 0.6%
 
23.1779 1 0.6%
 
27.1892 1 0.6%
 
28.5639 1 0.6%
 

Pulses
Numeric

Distinct count 148
Unique (%) 94.3%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.27069
Minimum 0
Maximum 2.6909
Zeros (%) 4.5%

Quantile statistics

Minimum 0
5-th percentile 0.00868
Q1 0.0427
Median 0.1483
Q3 0.3794
95-th percentile 0.85764
Maximum 2.6909
Range 2.6909
Interquartile range 0.3367

Descriptive statistics

Standard deviation 0.37877
Coef of variation 1.3993
Kurtosis 18.259
Mean 0.27069
MAD 0.24108
Skewness 3.6127
Sum 42.498
Variance 0.14347
Memory size 7.5 KiB
Value Count Frequency (%)  
0.0 7 4.5%
 
0.0658 2 1.3%
 
0.0353 2 1.3%
 
0.012 2 1.3%
 
0.1365 1 0.6%
 
0.2034 1 0.6%
 
0.0427 1 0.6%
 
0.0152 1 0.6%
 
0.2175 1 0.6%
 
0.2743 1 0.6%
 
Other values (138) 138 87.9%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 7 4.5%
 
0.0078 1 0.6%
 
0.0089 1 0.6%
 
0.0101 1 0.6%
 
0.0105 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
1.0624 1 0.6%
 
1.1084 1 0.6%
 
1.4398 1 0.6%
 
2.5545 1 0.6%
 
2.6909 1 0.6%
 

Spices
Numeric

Distinct count 143
Unique (%) 91.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.28085
Minimum 0
Maximum 2.6851
Zeros (%) 6.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0.0377
Median 0.0994
Q3 0.3181
95-th percentile 1.1925
Maximum 2.6851
Range 2.6851
Interquartile range 0.2804

Descriptive statistics

Standard deviation 0.46074
Coef of variation 1.6406
Kurtosis 10.247
Mean 0.28085
MAD 0.29841
Skewness 2.9727
Sum 44.093
Variance 0.21228
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0 10 6.4%
 
0.0836 2 1.3%
 
0.1235 2 1.3%
 
0.043 2 1.3%
 
0.0336 2 1.3%
 
0.0103 2 1.3%
 
0.1697 1 0.6%
 
0.0414 1 0.6%
 
0.0319 1 0.6%
 
0.0285 1 0.6%
 
Other values (133) 133 84.7%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 10 6.4%
 
0.0052 1 0.6%
 
0.0058 1 0.6%
 
0.0079 1 0.6%
 
0.008 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
1.4302 1 0.6%
 
1.7594 1 0.6%
 
2.2196 1 0.6%
 
2.597 1 0.6%
 
2.6851 1 0.6%
 

Starchy Roots
Numeric

Distinct count 154
Unique (%) 98.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.2263
Minimum 0.0124
Maximum 2.1778
Zeros (%) 0.0%

Quantile statistics

Minimum 0.0124
5-th percentile 0.02806
Q1 0.0481
Median 0.0877
Q3 0.1985
95-th percentile 0.94442
Maximum 2.1778
Range 2.1654
Interquartile range 0.1504

Descriptive statistics

Standard deviation 0.36724
Coef of variation 1.6228
Kurtosis 13.003
Mean 0.2263
MAD 0.22338
Skewness 3.3642
Sum 35.528
Variance 0.13486
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0513 2 1.3%
 
0.0329 2 1.3%
 
0.0565 2 1.3%
 
1.0806 1 0.6%
 
0.1103 1 0.6%
 
1.0609 1 0.6%
 
0.0697 1 0.6%
 
0.8018 1 0.6%
 
0.0965 1 0.6%
 
0.0516 1 0.6%
 
Other values (144) 144 91.7%
 

Minimum 5 values

Value Count Frequency (%)  
0.0124 1 0.6%
 
0.0168 1 0.6%
 
0.0207 1 0.6%
 
0.0217 1 0.6%
 
0.0247 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
1.2621 1 0.6%
 
1.3555 1 0.6%
 
2.0087 1 0.6%
 
2.1636 1 0.6%
 
2.1778 1 0.6%
 

Stimulants
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.62499
Minimum 0
Maximum 3.3838
Zeros (%) 1.3%

Quantile statistics

Minimum 0
5-th percentile 0.02604
Q1 0.1128
Median 0.3788
Q3 0.8506
95-th percentile 2.2073
Maximum 3.3838
Range 3.3838
Interquartile range 0.7378

Descriptive statistics

Standard deviation 0.69622
Coef of variation 1.114
Kurtosis 2.6621
Mean 0.62499
MAD 0.52418
Skewness 1.6985
Sum 98.123
Variance 0.48473
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0 2 1.3%
 
1.6874 1 0.6%
 
0.1213 1 0.6%
 
1.5137 1 0.6%
 
0.4161 1 0.6%
 
2.0044 1 0.6%
 
1.7184 1 0.6%
 
0.8266 1 0.6%
 
0.0671 1 0.6%
 
0.7926 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 2 1.3%
 
0.0176 1 0.6%
 
0.0186 1 0.6%
 
0.0193 1 0.6%
 
0.0204 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
2.6726 1 0.6%
 
2.6783 1 0.6%
 
2.7774 1 0.6%
 
2.7855 1 0.6%
 
3.3838 1 0.6%
 

Treenuts
Numeric

Distinct count 149
Unique (%) 94.9%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.69474
Minimum 0
Maximum 4.9756
Zeros (%) 5.1%

Quantile statistics

Minimum 0
5-th percentile 0.00552
Q1 0.1366
Median 0.4339
Q3 0.9018
95-th percentile 2.0332
Maximum 4.9756
Range 4.9756
Interquartile range 0.7652

Descriptive statistics

Standard deviation 0.83028
Coef of variation 1.1951
Kurtosis 9.1961
Mean 0.69474
MAD 0.56087
Skewness 2.6207
Sum 109.07
Variance 0.68937
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0 8 5.1%
 
1.3357 2 1.3%
 
0.3628 1 0.6%
 
0.3404 1 0.6%
 
0.7754 1 0.6%
 
1.181 1 0.6%
 
1.3194 1 0.6%
 
0.9355 1 0.6%
 
0.9704 1 0.6%
 
0.853 1 0.6%
 
Other values (139) 139 88.5%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 8 5.1%
 
0.0069 1 0.6%
 
0.0073 1 0.6%
 
0.0111 1 0.6%
 
0.0112 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
2.911 1 0.6%
 
3.3116 1 0.6%
 
3.8246 1 0.6%
 
4.9044 1 0.6%
 
4.9756 1 0.6%
 

Vegetal Products
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 29.405
Minimum 13.098
Maximum 44.982
Zeros (%) 0.0%

Quantile statistics

Minimum 13.098
5-th percentile 17.166
Q1 23.09
Median 29.075
Q3 35.586
95-th percentile 42.95
Maximum 44.982
Range 31.884
Interquartile range 12.496

Descriptive statistics

Standard deviation 8.0636
Coef of variation 0.27423
Kurtosis -0.92577
Mean 29.405
MAD 6.8099
Skewness 0.066471
Sum 4616.5
Variance 65.021
Memory size 7.5 KiB
Value Count Frequency (%)  
33.2667 1 0.6%
 
42.7706 1 0.6%
 
31.4598 1 0.6%
 
30.9068 1 0.6%
 
24.1576 1 0.6%
 
26.9158 1 0.6%
 
23.2622 1 0.6%
 
43.8301 1 0.6%
 
28.3194 1 0.6%
 
35.5857 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
13.0982 1 0.6%
 
13.2732 1 0.6%
 
14.585 1 0.6%
 
15.5628 1 0.6%
 
15.8736 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
43.9253 1 0.6%
 
43.9582 1 0.6%
 
44.0022 1 0.6%
 
44.6892 1 0.6%
 
44.9818 1 0.6%
 

Vegetable Oils
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 18.613
Minimum 4.9549
Maximum 36.419
Zeros (%) 0.0%

Quantile statistics

Minimum 4.9549
5-th percentile 8.3122
Q1 13.868
Median 18.173
Q3 23.554
95-th percentile 30.025
Maximum 36.419
Range 31.464
Interquartile range 9.6857

Descriptive statistics

Standard deviation 6.7794
Coef of variation 0.36423
Kurtosis -0.60667
Mean 18.613
MAD 5.5039
Skewness 0.17244
Sum 2922.2
Variance 45.96
Memory size 7.5 KiB
Value Count Frequency (%)  
29.9945 1 0.6%
 
18.4369 1 0.6%
 
17.3147 1 0.6%
 
21.215 1 0.6%
 
14.4436 1 0.6%
 
27.4593 1 0.6%
 
14.1945 1 0.6%
 
18.8819 1 0.6%
 
18.8603 1 0.6%
 
11.9281 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
4.9549 1 0.6%
 
6.4849 1 0.6%
 
6.7 1 0.6%
 
7.1538 1 0.6%
 
7.2939 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
30.643 1 0.6%
 
31.449 1 0.6%
 
33.0391 1 0.6%
 
34.0479 1 0.6%
 
36.4186 1 0.6%
 

Vegetables
Numeric

Distinct count 156
Unique (%) 99.4%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.30309
Minimum 0.0263
Maximum 1.1538
Zeros (%) 0.0%

Quantile statistics

Minimum 0.0263
5-th percentile 0.0793
Q1 0.1729
Median 0.248
Q3 0.3612
95-th percentile 0.72558
Maximum 1.1538
Range 1.1275
Interquartile range 0.1883

Descriptive statistics

Standard deviation 0.20382
Coef of variation 0.67246
Kurtosis 3.3816
Mean 0.30309
MAD 0.14944
Skewness 1.7208
Sum 47.585
Variance 0.041542
Memory size 2.5 KiB
Value Count Frequency (%)  
0.1567 2 1.3%
 
0.3066 1 0.6%
 
0.2938 1 0.6%
 
0.9395 1 0.6%
 
0.2694 1 0.6%
 
0.0665 1 0.6%
 
0.1896 1 0.6%
 
0.4514 1 0.6%
 
0.2019 1 0.6%
 
0.1305 1 0.6%
 
Other values (146) 146 93.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0263 1 0.6%
 
0.0431 1 0.6%
 
0.0665 1 0.6%
 
0.0738 1 0.6%
 
0.0746 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.8535 1 0.6%
 
0.8717 1 0.6%
 
0.9395 1 0.6%
 
1.1118 1 0.6%
 
1.1538 1 0.6%
 

Obesity
Numeric

Distinct count 113
Unique (%) 72.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 18.88
Minimum 2.9
Maximum 45.6
Zeros (%) 0.0%

Quantile statistics

Minimum 2.9
5-th percentile 4.48
Q1 8.6
Median 21.3
Q3 25.7
95-th percentile 32.02
Maximum 45.6
Range 42.7
Interquartile range 17.1

Descriptive statistics

Standard deviation 9.6162
Coef of variation 0.50933
Kurtosis -0.69653
Mean 18.88
MAD 8.1745
Skewness -0.018764
Sum 2964.2
Variance 92.472
Memory size 2.5 KiB
Value Count Frequency (%)  
4.5 4 2.5%
 
25.7 4 2.5%
 
27.4 3 1.9%
 
7.1 3 1.9%
 
6.0 3 1.9%
 
19.4 2 1.3%
 
8.2 2 1.3%
 
22.3 2 1.3%
 
23.8 2 1.3%
 
26.6 2 1.3%
 
Other values (103) 130 82.8%
 

Minimum 5 values

Value Count Frequency (%)  
2.9 1 0.6%
 
3.4 1 0.6%
 
3.5 1 0.6%
 
3.6 1 0.6%
 
3.8 2 1.3%
 

Maximum 5 values

Value Count Frequency (%)  
35.0 1 0.6%
 
37.0 1 0.6%
 
37.3 1 0.6%
 
45.5 1 0.6%
 
45.6 1 0.6%
 

Confirmed
Numeric

Distinct count 152
Unique (%) 96.8%
Missing (%) 3.8%
Missing (n) 6
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.085396
Minimum 4.7059e-05
Maximum 0.64048
Zeros (%) 0.0%

Quantile statistics

Minimum 4.7059e-05
5-th percentile 0.00061365
Q1 0.0056523
Median 0.023208
Q3 0.12528
95-th percentile 0.36721
Maximum 0.64048
Range 0.64044
Interquartile range 0.11963

Descriptive statistics

Standard deviation 0.12869
Coef of variation 1.507
Kurtosis 3.9173
Mean 0.085396
MAD 0.095922
Skewness 2.0409
Sum 12.895
Variance 0.016561
Memory size 7.5 KiB
Value Count Frequency (%)  
0.020043836232332 1 0.6%
 
0.0140169194865811 1 0.6%
 
0.0514036458333333 1 0.6%
 
0.11669987321137501 1 0.6%
 
0.0144264955943732 1 0.6%
 
0.0045710669840600205 1 0.6%
 
0.17795408507764998 1 0.6%
 
0.34251610858772596 1 0.6%
 
0.0703065134099617 1 0.6%
 
0.184714370452867 1 0.6%
 
Other values (141) 141 89.8%
 
(Missing) 6 3.8%
 

Minimum 5 values

Value Count Frequency (%)  
4.70588235294118e-05 1 0.6%
 
0.000165462818595475 1 0.6%
 
0.000266292922214436 1 0.6%
 
0.00032775691362239704 1 0.6%
 
0.000347076615601495 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.48859312270902394 1 0.6%
 
0.49230613484510993 1 0.6%
 
0.494030548297325 1 0.6%
 
0.499445983379501 1 0.6%
 
0.6404838709677421 1 0.6%
 

Deaths
Numeric

Distinct count 139
Unique (%) 88.5%
Missing (%) 3.8%
Missing (n) 6
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.004342
Minimum 0
Maximum 0.079857
Zeros (%) 8.9%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0.00015992
Median 0.00055147
Q3 0.0028374
95-th percentile 0.02523
Maximum 0.079857
Range 0.079857
Interquartile range 0.0026775

Descriptive statistics

Standard deviation 0.011129
Coef of variation 2.563
Kurtosis 20.82
Mean 0.004342
MAD 0.005737
Skewness 4.2922
Sum 0.65564
Variance 0.00012385
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0 14 8.9%
 
0.0038834951456310704 1 0.6%
 
1.61039239894788e-05 1 0.6%
 
0.0122991527899503 1 0.6%
 
0.000435172148982465 1 0.6%
 
0.00134167519090325 1 0.6%
 
0.00034088018315950097 1 0.6%
 
0.000513384671800513 1 0.6%
 
0.0592441526989994 1 0.6%
 
0.0013599999999999999 1 0.6%
 
Other values (128) 128 81.5%
 
(Missing) 6 3.8%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 14 8.9%
 
4.4611390180140805e-06 1 0.6%
 
6.990807088678391e-06 1 0.6%
 
7.41592198450072e-06 1 0.6%
 
9.54593184204665e-06 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.0372447987555901 1 0.6%
 
0.0433954406638492 1 0.6%
 
0.0535752754992129 1 0.6%
 
0.0592441526989994 1 0.6%
 
0.0798568685634491 1 0.6%
 

Recovered
Numeric

Distinct count 152
Unique (%) 96.8%
Missing (%) 3.8%
Missing (n) 6
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.044633
Minimum 0
Maximum 0.60129
Zeros (%) 0.6%

Quantile statistics

Minimum 0
5-th percentile 0.00015598
Q1 0.0023399
Median 0.01
Q3 0.04947
95-th percentile 0.16746
Maximum 0.60129
Range 0.60129
Interquartile range 0.04713

Descriptive statistics

Standard deviation 0.08753
Coef of variation 1.9611
Kurtosis 17.411
Mean 0.044633
MAD 0.052744
Skewness 3.7913
Sum 6.7396
Variance 0.0076615
Memory size 7.5 KiB
Value Count Frequency (%)  
0.18888808664259898 1 0.6%
 
0.0141660917762923 1 0.6%
 
0.0006961616007912661 1 0.6%
 
0.0291976674039815 1 0.6%
 
0.025568069551472 1 0.6%
 
0.0353780617678381 1 0.6%
 
0.000379346680716544 1 0.6%
 
0.00276223420490386 1 0.6%
 
0.00304002444240758 1 0.6%
 
0.13798709552459199 1 0.6%
 
Other values (141) 141 89.8%
 
(Missing) 6 3.8%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 1 0.6%
 
1.7145600438927402e-05 1 0.6%
 
5.4093613771597706e-05 1 0.6%
 
0.000108851792039544 1 0.6%
 
0.000122908842608399 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.319452764854588 1 0.6%
 
0.324311712552497 1 0.6%
 
0.426402105689411 1 0.6%
 
0.495567867036011 1 0.6%
 
0.601290322580645 1 0.6%
 

Active
Numeric

Distinct count 148
Unique (%) 94.3%
Missing (%) 3.8%
Missing (n) 6
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.036418
Minimum 0
Maximum 0.35315
Zeros (%) 3.2%

Quantile statistics

Minimum 0
5-th percentile 7.1749e-05
Q1 0.0017281
Median 0.0087378
Q3 0.030314
95-th percentile 0.19483
Maximum 0.35315
Range 0.35315
Interquartile range 0.028586

Descriptive statistics

Standard deviation 0.065191
Coef of variation 1.7901
Kurtosis 6.6996
Mean 0.036418
MAD 0.0441
Skewness 2.5637
Sum 5.4991
Variance 0.0042498
Memory size 7.5 KiB
Value Count Frequency (%)  
0.0 5 3.2%
 
0.00116597557728155 1 0.6%
 
0.00023376368454393798 1 0.6%
 
0.22089247520902203 1 0.6%
 
0.00798325231123725 1 0.6%
 
0.000525816739936212 1 0.6%
 
0.00110803324099723 1 0.6%
 
0.14995322731524802 1 0.6%
 
0.0031978126485972397 1 0.6%
 
8.225860675378931e-06 1 0.6%
 
Other values (137) 137 87.3%
 
(Missing) 6 3.8%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 5 3.2%
 
8.225860675378931e-06 1 0.6%
 
4.70588235294118e-05 1 0.6%
 
7.007708479327261e-05 1 0.6%
 
7.342143906020559e-05 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.22089247520902203 1 0.6%
 
0.223311220074993 1 0.6%
 
0.27915866643393294 1 0.6%
 
0.29852626574756397 1 0.6%
 
0.35315096626796705 1 0.6%
 

Population_x
Numeric

Distinct count 157
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 45795000
Minimum 97000
Maximum 1398000000
Zeros (%) 0.0%

Quantile statistics

Minimum 97000
5-th percentile 307800
Q1 3269000
Median 10023000
Q3 31781000
95-th percentile 150120000
Maximum 1398000000
Range 1397900000
Interquartile range 28512000

Descriptive statistics

Standard deviation 161640000
Coef of variation 3.5297
Kurtosis 61.722
Mean 45795000
MAD 56917000
Skewness 7.6198
Sum 7189800000
Variance 2.6128e+16
Memory size 7.5 KiB
Value Count Frequency (%)  
25305000.0 1 0.6%
 
43406000.0 1 0.6%
 
3997000.0 1 0.6%
 
361000.0 1 0.6%
 
5819000.0 1 0.6%
 
11212000.0 1 0.6%
 
31427000.0 1 0.6%
 
17861000.0 1 0.6%
 
29162000.0 1 0.6%
 
16296000.0 1 0.6%
 
Other values (147) 147 93.6%
 

Minimum 5 values

Value Count Frequency (%)  
97000.0 1 0.6%
 
111000.0 1 0.6%
 
112000.0 1 0.6%
 
123000.0 1 0.6%
 
180000.0 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
216565000.0 1 0.6%
 
268419000.0 1 0.6%
 
329153000.0 1 0.6%
 
1391885000.0 1 0.6%
 
1398030000.0 1 0.6%
 

Unit (all except Population)
Constant

This variable is constant and should be ignored for analysis

Constant value %

Year
Constant

This variable is constant and should be ignored for analysis

Constant value 2015

Status
Categorical

Distinct count 2
Unique (%) 1.3%
Missing (%) 0.0%
Missing (n) 0
Developing
127
Developed
30
Value Count Frequency (%)  
Developing 127 80.9%
 
Developed 30 19.1%
 

Life expectancy
Numeric

Distinct count 118
Unique (%) 75.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 72.01
Minimum 51
Maximum 88
Zeros (%) 0.0%

Quantile statistics

Minimum 51
5-th percentile 58.08
Q1 66.3
Median 74.1
Q3 77
95-th percentile 82.72
Maximum 88
Range 37
Interquartile range 10.7

Descriptive statistics

Standard deviation 7.9091
Coef of variation 0.10983
Kurtosis -0.22973
Mean 72.01
MAD 6.4375
Skewness -0.53125
Sum 11306
Variance 62.554
Memory size 2.5 KiB
Value Count Frequency (%)  
75.0 4 2.5%
 
61.8 3 1.9%
 
76.1 3 1.9%
 
81.1 3 1.9%
 
75.5 3 1.9%
 
74.6 3 1.9%
 
74.9 3 1.9%
 
74.8 3 1.9%
 
65.7 3 1.9%
 
69.2 2 1.3%
 
Other values (108) 127 80.9%
 

Minimum 5 values

Value Count Frequency (%)  
51.0 1 0.6%
 
52.4 1 0.6%
 
52.5 1 0.6%
 
53.1 1 0.6%
 
53.7 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
83.4 1 0.6%
 
83.7 1 0.6%
 
85.0 2 1.3%
 
86.0 1 0.6%
 
88.0 1 0.6%
 

Adult Mortality
Numeric

Distinct count 117
Unique (%) 74.5%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 148.08
Minimum 1
Maximum 484
Zeros (%) 0.0%

Quantile statistics

Minimum 1
5-th percentile 19
Q1 74
Median 137
Q3 198
95-th percentile 337.6
Maximum 484
Range 483
Interquartile range 124

Descriptive statistics

Standard deviation 94.675
Coef of variation 0.63934
Kurtosis 0.60037
Mean 148.08
MAD 74.362
Skewness 0.85128
Sum 23249
Variance 8963.4
Memory size 2.5 KiB
Value Count Frequency (%)  
118.0 4 2.5%
 
74.0 3 1.9%
 
19.0 3 1.9%
 
13.0 3 1.9%
 
95.0 3 1.9%
 
249.0 3 1.9%
 
16.0 2 1.3%
 
152.0 2 1.3%
 
222.0 2 1.3%
 
146.0 2 1.3%
 
Other values (107) 130 82.8%
 

Minimum 5 values

Value Count Frequency (%)  
1.0 1 0.6%
 
13.0 3 1.9%
 
16.0 2 1.3%
 
17.0 1 0.6%
 
19.0 3 1.9%
 

Maximum 5 values

Value Count Frequency (%)  
357.0 1 0.6%
 
365.0 1 0.6%
 
397.0 1 0.6%
 
413.0 1 0.6%
 
484.0 1 0.6%
 

percentage expenditure
Numeric

Distinct count 3
Unique (%) 1.9%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.7787
Minimum 0
Maximum 364.98
Zeros (%) 98.7%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 0
Maximum 364.98
Range 364.98
Interquartile range 0

Descriptive statistics

Standard deviation 29.643
Coef of variation 10.668
Kurtosis 145.62
Mean 2.7787
MAD 5.4866
Skewness 11.924
Sum 436.25
Variance 878.69
Memory size 2.5 KiB
Value Count Frequency (%)  
0.0 155 98.7%
 
71.27962362 1 0.6%
 
364.9752287 1 0.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 155 98.7%
 
71.27962362 1 0.6%
 
364.9752287 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
0.0 155 98.7%
 
71.27962362 1 0.6%
 
364.9752287 1 0.6%
 

BMI
Numeric

Distinct count 126
Unique (%) 80.3%
Missing (%) 0.6%
Missing (n) 1
Infinite (%) 0.0%
Infinite (n) 0
Mean 43.297
Minimum 2.5
Maximum 77.6
Zeros (%) 0.0%

Quantile statistics

Minimum 2.5
5-th percentile 5.825
Q1 24.175
Median 52.6
Q3 61.45
95-th percentile 66.7
Maximum 77.6
Range 75.1
Interquartile range 37.275

Descriptive statistics

Standard deviation 20.804
Coef of variation 0.48049
Kurtosis -1.2007
Mean 43.297
MAD 18.854
Skewness -0.44637
Sum 6754.4
Variance 432.81
Memory size 2.5 KiB
Value Count Frequency (%)  
61.2 3 1.9%
 
62.1 3 1.9%
 
66.1 3 1.9%
 
27.4 3 1.9%
 
19.1 3 1.9%
 
25.4 3 1.9%
 
23.8 3 1.9%
 
63.7 2 1.3%
 
24.3 2 1.3%
 
23.4 2 1.3%
 
Other values (115) 129 82.2%
 

Minimum 5 values

Value Count Frequency (%)  
2.5 1 0.6%
 
3.8 1 0.6%
 
3.9 1 0.6%
 
4.6 1 0.6%
 
4.7 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
68.2 1 0.6%
 
69.6 2 1.3%
 
71.4 1 0.6%
 
74.7 1 0.6%
 
77.6 1 0.6%
 

under-five deaths
Numeric

Distinct count 50
Unique (%) 31.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 32.815
Minimum 0
Maximum 1100
Zeros (%) 28.7%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 3
Q3 21
95-th percentile 101
Maximum 1100
Range 1100
Interquartile range 21

Descriptive statistics

Standard deviation 113.69
Coef of variation 3.4645
Kurtosis 60.121
Mean 32.815
MAD 44.281
Skewness 7.299
Sum 5152
Variance 12925
Memory size 2.5 KiB
Value Count Frequency (%)  
0 45 28.7%
 
1 21 13.4%
 
3 9 5.7%
 
2 9 5.7%
 
12 4 2.5%
 
4 3 1.9%
 
10 3 1.9%
 
21 3 1.9%
 
5 3 1.9%
 
11 3 1.9%
 
Other values (40) 54 34.4%
 

Minimum 5 values

Value Count Frequency (%)  
0 45 28.7%
 
1 21 13.4%
 
2 9 5.7%
 
3 9 5.7%
 
4 3 1.9%
 

Maximum 5 values

Value Count Frequency (%)  
183 1 0.6%
 
194 1 0.6%
 
433 1 0.6%
 
747 1 0.6%
 
1100 1 0.6%
 

Polio
Numeric

Distinct count 38
Unique (%) 24.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 83.917
Minimum 6
Maximum 99
Zeros (%) 0.0%

Quantile statistics

Minimum 6
5-th percentile 9
Q1 84
Median 93
Q3 97
95-th percentile 99
Maximum 99
Range 93
Interquartile range 13

Descriptive statistics

Standard deviation 23.265
Coef of variation 0.27723
Kurtosis 4.7838
Mean 83.917
MAD 15.743
Skewness -2.3297
Sum 13175
Variance 541.24
Memory size 2.5 KiB
Value Count Frequency (%)  
99.0 27 17.2%
 
97.0 14 8.9%
 
93.0 11 7.0%
 
95.0 10 6.4%
 
98.0 9 5.7%
 
96.0 8 5.1%
 
88.0 7 4.5%
 
92.0 7 4.5%
 
91.0 6 3.8%
 
89.0 6 3.8%
 
Other values (28) 52 33.1%
 

Minimum 5 values

Value Count Frequency (%)  
6.0 1 0.6%
 
7.0 1 0.6%
 
8.0 4 2.5%
 
9.0 4 2.5%
 
42.0 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
95.0 10 6.4%
 
96.0 8 5.1%
 
97.0 14 8.9%
 
98.0 9 5.7%
 
99.0 27 17.2%
 

Diphtheria
Numeric

Distinct count 38
Unique (%) 24.2%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 85.478
Minimum 6
Maximum 99
Zeros (%) 0.0%

Quantile statistics

Minimum 6
5-th percentile 41.4
Q1 86
Median 93
Q3 97
95-th percentile 99
Maximum 99
Range 93
Interquartile range 11

Descriptive statistics

Standard deviation 21.122
Coef of variation 0.2471
Kurtosis 6.4406
Mean 85.478
MAD 13.944
Skewness -2.5729
Sum 13420
Variance 446.12
Memory size 2.5 KiB
Value Count Frequency (%)  
99.0 23 14.6%
 
95.0 14 8.9%
 
97.0 14 8.9%
 
98.0 13 8.3%
 
93.0 11 7.0%
 
89.0 8 5.1%
 
96.0 8 5.1%
 
91.0 7 4.5%
 
87.0 6 3.8%
 
92.0 5 3.2%
 
Other values (28) 48 30.6%
 

Minimum 5 values

Value Count Frequency (%)  
6.0 2 1.3%
 
8.0 3 1.9%
 
9.0 2 1.3%
 
23.0 1 0.6%
 
46.0 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
95.0 14 8.9%
 
96.0 8 5.1%
 
97.0 14 8.9%
 
98.0 13 8.3%
 
99.0 23 14.6%
 

HIV/AIDS
Numeric

Distinct count 28
Unique (%) 17.8%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.6242
Minimum 0.1
Maximum 9.3
Zeros (%) 0.0%

Quantile statistics

Minimum 0.1
5-th percentile 0.1
Q1 0.1
Median 0.1
Q3 0.3
95-th percentile 3.52
Maximum 9.3
Range 9.2
Interquartile range 0.2

Descriptive statistics

Standard deviation 1.293
Coef of variation 2.0715
Kurtosis 15.607
Mean 0.6242
MAD 0.78276
Skewness 3.5465
Sum 98
Variance 1.672
Memory size 2.5 KiB
Value Count Frequency (%)  
0.1 100 63.7%
 
0.2 11 7.0%
 
0.3 9 5.7%
 
0.4 4 2.5%
 
0.5 4 2.5%
 
2.8 3 1.9%
 
1.0 2 1.3%
 
0.6 2 1.3%
 
2.1 2 1.3%
 
0.9 2 1.3%
 
Other values (18) 18 11.5%
 

Minimum 5 values

Value Count Frequency (%)  
0.1 100 63.7%
 
0.2 11 7.0%
 
0.3 9 5.7%
 
0.4 4 2.5%
 
0.5 4 2.5%
 

Maximum 5 values

Value Count Frequency (%)  
4.1 1 0.6%
 
4.4 1 0.6%
 
4.8 1 0.6%
 
6.2 1 0.6%
 
9.3 1 0.6%
 

Population_y
Numeric

Distinct count 133
Unique (%) 84.7%
Missing (%) 15.9%
Missing (n) 25
Infinite (%) 0.0%
Infinite (n) 0
Mean 11623000
Minimum 2966
Maximum 258160000
Zeros (%) 0.0%

Quantile statistics

Minimum 2966
5-th percentile 31869
Q1 289190
Median 2424500
Q3 10309000
95-th percentile 44781000
Maximum 258160000
Range 258160000
Interquartile range 10020000

Descriptive statistics

Standard deviation 29933000
Coef of variation 2.5754
Kurtosis 42.463
Mean 11623000
MAD 14118000
Skewness 5.9702
Sum 1534200000
Variance 895970000000000
Memory size 7.5 KiB
Value Count Frequency (%)  
487852.0 1 0.6%
 
896829.0 1 0.6%
 
11629553.0 1 0.6%
 
8381.0 1 0.6%
 
126265.0 1 0.6%
 
6312478.0 1 0.6%
 
49163.0 1 0.6%
 
56964.0 1 0.6%
 
17762681.0 1 0.6%
 
622159.0 1 0.6%
 
Other values (122) 122 77.7%
 
(Missing) 25 15.9%
 

Minimum 5 values

Value Count Frequency (%)  
2966.0 1 0.6%
 
8381.0 1 0.6%
 
11247.0 1 0.6%
 
13692.0 1 0.6%
 
26463.0 1 0.6%
 

Maximum 5 values

Value Count Frequency (%)  
48228697.0 1 0.6%
 
78271472.0 1 0.6%
 
81686611.0 1 0.6%
 
181181744.0 1 0.6%
 
258162113.0 1 0.6%
 

Income composition of resources
Highly correlated

This variable is highly correlated with Life expectancy and should be ignored for analysis

Correlation 0.90758

Schooling
Highly correlated

This variable is highly correlated with Income composition of resources and should be ignored for analysis

Correlation 0.92639

Correlations

Sample

Country Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals Oilcrops Pulses Spices Starchy Roots Stimulants Treenuts Vegetal Products Vegetable Oils Vegetables Obesity Confirmed Deaths Recovered Active Population_x Unit (all except Population) Year Status Life expectancy Adult Mortality percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population_y Income composition of resources Schooling
0 Afghanistan 21.6397 6.2224 8.0353 0.6859 0.0327 0.4246 6.1244 8.2803 0.3103 1.0452 0.1960 0.2776 0.0490 0.0980 0.7513 28.3684 17.0831 0.3593 4.5 0.021411 0.000492 0.002445 0.018474 38042000.0 % 2015 Developing 65.0 263.0 71.279624 19.1 83 6.0 65.0 0.1 33736494.0 0.479 10.1
1 Albania 32.0002 3.4172 2.6734 1.6448 0.1445 0.6418 8.7428 17.7576 0.2933 3.1622 0.1148 0.0000 0.0510 0.5270 0.9181 17.9998 9.2443 0.6503 22.3 0.033730 0.001085 0.026522 0.006123 2858000.0 % 2015 Developing 77.8 74.0 364.975229 58.0 0 99.0 99.0 0.1 28873.0 0.762 14.2
2 Algeria 14.4175 0.8972 4.2035 1.2171 0.2008 0.5772 3.8961 8.0934 0.1067 1.1983 0.2698 0.1568 0.1129 0.2886 0.8595 35.5857 27.3606 0.5145 26.6 0.017375 0.001309 0.009142 0.006925 43406000.0 % 2015 Developing 75.6 19.0 0.000000 59.5 24 95.0 95.0 0.1 39871528.0 0.743 14.4
3 Angola 15.3041 1.3130 6.5545 0.1539 1.4155 0.3488 11.0268 1.2309 0.1539 3.9902 0.3282 0.0103 0.7078 0.1128 0.0308 34.7010 22.4638 0.1231 6.8 0.000165 0.000010 0.000054 0.000102 31427000.0 % 2015 Developing 52.4 335.0 0.000000 23.3 98 7.0 64.0 1.9 2785935.0 0.531 11.4
4 Antigua and Barbuda 27.7033 4.6686 3.2153 0.3872 1.5263 1.2177 14.3202 6.6607 0.1347 1.3579 0.0673 0.3591 0.0449 1.0549 0.2020 22.2995 14.4436 0.2469 19.1 0.025773 0.003093 0.019588 0.003093 97000.0 % 2015 Developing 76.4 13.0 0.000000 47.7 0 86.0 99.0 0.2 NaN 0.784 13.9

Analysis Note:

Highly correlated =or Above +/- .5 correlation: Schooling w/'animal fat'

Indicator of Health Index:

To control for availability of ventilators or other country financed health interventions and infrastructure to deal with health emergencies, like pandemics.

Notes

1) Note Major Finding of Study, caveat of this data set: "The GHS Index analysis finds no country is fully prepared for epidemics or pandemics. Collectively, international preparedness is weak. Many countries do not show evidence of the health security capacities and capabilities that are needed to prevent, detect, and respond to significant infectious disease outbreaks. The average overall GHS Index score among all 195 countries assessed is 40.2 of a possible score of 100." ~Index: Global Healthy Security Index https://www.ghsindex.org/wp-content/uploads/2019/10/2019-Global-Health-Security-Index.pdf Although 86% of countries invest local or donor funds in health security, few countries pay for health security gap assessments and action plans out of national budgets.

2) Variable: Ventilators

3) Variable: Critical Care Beds Sampling (Per 100,000) Saudi Arabia -22.8 per 100,000 (high) Pakistan - 1.5 (lower-middle) Iran - 4.6 (upper-middle medium) Oman - 14.6 (high) Yemen - 0.0000245614 (700 beds for country population of 28.5 million) Source: https://www.researchgate.net/figure/Number-of-critical-care-beds-per-100-000-population_fig1_338520008 expenditure health - https://link.springer.com/article/10.1007/s00134-012-2627-8

In [25]:
# Define a dictionary containing Health Index 
health_score_data = {'Country': ['United States','United Kingdom', 'Netherlands', 'Australia','Canada','Thailand','Sweden',
                                 'Denmark','South Korea','Finland','France','Slovenia','Switzerland','Germany','Spain','Norway',
                                 'Latvia','Malaysia','Belgium','Portugal','Japan', 'Brazil','Ireland','Singapore','Argentina',
                                 'Austria','Chile','Mexico','Estonia','Indonesia','Italy','Poland','Lithuania','South Africa',
                                 'Hungary','New Zealand','Greece','Croatia','Albania','Turkey','Serbia','Czech Republic','Georgia',
                                 'Armenia','Ecuador','Mongolia','Kyrgyz Republic','Saudi Arabia','Peru','Vietnam','China','Slovakia',
                                 'Philippines','Israel','Kenya','United Arab Emirates','India','Iceland','Kuwait','Romania',
                                 'Bulgaria','Costa Rica','Russia','Uganda','Colombia','El Salvador','Luxembourg','Montenegro','Morocco',
                                 'Panama','Liechtenstein','Myanmar','Laos','Lebanon','Nicaragua','Oman','Cyprus','Moldova',
                                 'Bosnia and Herzegovina','Jordan','Uruguay','Qatar','Kazakhstan','Ethiopia','Bhutan','Madagascar',
                                 'Egypt','Bahrain','Cambodia','North Macedonia','Dominican Republic','Sierra Leone','Zimbabwe','Ukraine',
                                 'Senegal','Nigeria','Iran','Malta','Trinidad and Tobago','Suriname','Tanzania','Bolivia',
                                 'Paraguay','Namibia',"Côte d'Ivoire",'Ghana','Pakistan','Belarus','St. Lucia','Cuba','Liberia','Nepal',
                                 'Bangladesh','Mauritius','Cameroon','Uzbekistan','Azerbaijan','Gambia','Rwanda','Sri Lanka','Maldives',
                                 'Tunisia','St. Vincent and The Grenadines','Micronesia','Guatemala','Guinea','Monaco','Brunei','Togo',
                                 'Afghanistan','Tajikistan','Niger','Barbados','Seychelles','Belize','Turkmenistan', 'Guyana','Haiti',
                                 'Botswana','San Marino','Swaziland','Bahamas','Andorra','Lesotho','Burkina Faso','Cabo Verde',
                                 'Antigua and Barbuda','Jamaica','Mali','Benin','Chad','Zambia','Mozambique','Malawi',
                                 'Papua New Guinea','Honduras','Grenada','Mauritania','Central African Republic','Comoros','Congo','Samoa',
                                 'St. Kitts and Nevis','Sudan','Vanuatu','Timor-Leste','Iraq','Fiji','Libya','Angola','Tonga',
                                 'Dominica','Algeria','Brazzaville','Djibouti','Venezuela','Burundi','Eritrea','Palau','South Sudan','Tuvalu',
                                 'Nauru','Solomon Islands','Niue','Cook Islands','Gabon','Guinea-Bissau','Syria','Kiribati',
                                 'Yemen','Marshall Islands','São Tomé and Príncipe','North Korea','Somalia','Equatorial Guinea'],
                     
        'Health_index': [83.5,77.9,75.6,75.5,75.3,73.2,72.1,70.4,70.2,68.7,68.2,67.2,67.0,66.0,65.9,64.6,62.9,62.2,61.0,
                        60.3,59.8,59.7,59.0,58.7,58.6,58.5,58.3,57.6,57.0,56.6,56.2,55.4,55.0,54.8,54.0,54.0,53.8,53.3,52.9,52.4,
                        52.3,52.0,52.0,50.2,50.1,49.5,49.3,49.3,49.2,49.1,48.2,47.9,47.6,47.3,47.1,46.7,46.5,46.3,46.1,45.8,45.6,
                        45.1,44.3,44.3,44.2,44.2,43.8,43.7,43.7,43.7,43.5,43.4,43.1,43.1,43.1,43.1,43.0,42.9,42.8,42.1,
                        41.3,41.2,40.7,40.6,40.3,40.1,39.9,39.4,39.2,39.1,38.3,38.2,38.2,38.0,37.9,37.8,37.7,37.3,36.6,36.5,36.4,
                        35.8,35.7,35.6,35.5,35.5,35.5,35.3,35.3,35.2,35.1,35.1,35.0,34.9,34.4,34.3,34.2,34.2,34.2,33.9,33.8,33.7,
                        33.0,32.8,32.7,32.7,32.7,32.6,32.5,32.3,32.3,32.2,31.9,31.9,31.8,31.8,31.7,31.5,31.1,31.1,31.1,30.6,30.5,
                        30.2,30.1,29.3,29.0,29.0,29.0,28.8,28.8,28.7,28.1,28.0,27.8,27.6,27.5,27.5,27.3,27.2,26.5,26.4,26.2,26.2,
                        26.1,26.0,25.8,25.7,25.7,25.2,25.1,24.0,23.6,23.6,23.2,23.0,22.8,22.4,21.9,21.7,21.6,20.8,20.7,20.5,
                        20.4,20.0,20.0,19.9,19.2,18.5,18.2,17.7,17.5,16.6,16.2]} 
  
# Convert the dictionary to add into DataFrame 
health_df = pd.DataFrame(health_score_data) 
  
# Using DataFrame.insert() to add the column into covid_df
#covid_df.insert(df2["health_score_data"]) 

#Merge the 2 data sets
merged_df = pd.merge(covid_df, health_df, left_on='Country', right_on='Country')
In [27]:
merged_df.describe()
Out[27]:
Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals Oilcrops ... percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population_y Income composition of resources Schooling Health_index
count 145.000000 145.000000 145.000000 145.000000 145.000000 145.000000 145.000000 145.000000 145.000000 145.000000 ... 145.000000 144.000000 145.000000 145.000000 145.000000 145.000000 1.300000e+02 145.000000 145.000000 145.000000
mean 20.523469 4.186661 4.506248 0.961694 0.853244 0.550711 9.116830 5.255140 0.149395 3.347308 ... 3.008654 43.340972 34.110345 83.041379 84.703448 0.657241 1.178883e+07 0.697276 13.077241 42.395172
std 8.081516 3.432512 3.257513 0.666706 0.976601 0.899651 4.345607 3.348297 0.116589 4.752064 ... 30.841796 20.697557 117.879418 23.977094 21.768710 1.336545 3.013329e+07 0.155513 2.938318 13.696793
min 5.018200 0.034800 0.990800 0.058000 0.017400 0.037300 0.906100 0.177900 0.000000 0.064000 ... 0.000000 2.500000 0.000000 6.000000 6.000000 0.100000 2.966000e+03 0.347000 5.400000 18.500000
25% 13.676600 1.611300 2.118700 0.350900 0.318900 0.237300 6.101000 2.229600 0.076300 0.856600 ... 0.000000 24.300000 0.000000 83.000000 84.000000 0.100000 2.970185e+05 0.575000 10.800000 31.900000
50% 20.151100 3.221300 3.447900 0.901900 0.563300 0.351700 8.912700 5.127900 0.118000 1.580300 ... 0.000000 52.600000 3.000000 93.000000 93.000000 0.100000 2.510890e+06 0.734000 13.100000 40.100000
75% 26.910000 6.378700 5.737400 1.266400 1.045700 0.577200 11.558100 7.592600 0.188500 3.438900 ... 0.000000 61.450000 21.000000 97.000000 97.000000 0.300000 1.095208e+07 0.804000 15.300000 52.300000
max 36.901800 14.937300 18.376300 3.275600 8.406800 9.672700 21.606200 17.757600 0.726800 28.563900 ... 364.975229 77.600000 1100.000000 99.000000 99.000000 9.300000 2.581621e+08 0.948000 20.400000 75.600000

8 rows × 37 columns

Visual 1: Health Score Index verus Recovered

In [28]:
a = merged_df['Health_index']
b = merged_df['Recovered']
plt.scatter(a,b, color ='pink')
plt.xlabel('Health Score Index')
plt.ylabel("Recovered")
plt.title('Health Score Index verus Recovered')
plt.show()

Observation:

Note thata HSI indicates access to infrastructure, like more ventilators and critical beds). However, we see that that countries having the highest "Health Score Index" (HSI) do not present the highest 'Recovered' rates for COVID_19 cases. Actually, those in the mid range of HSI, present the highest recovery rate 'Recovered'. Other contributing factors, like nutrition or dietary choices, may factor into the features importance.

We will conduct a 'features importance' test. Below, we will show in visuals 2-5 where our nutritional factors correlate with each other and our target of interest: the "Recovered" rate for COVID-19 cases.

In [29]:
merged_df.columns
Out[29]:
Index(['Country', 'Animal Products', 'Animal fats', 'Cereals - Excluding Beer',
       'Eggs', 'Fish, Seafood', 'Fruits - Excluding Wine', 'Meat',
       'Milk - Excluding Butter', 'Offals', 'Oilcrops', 'Pulses', 'Spices',
       'Starchy Roots', 'Stimulants', 'Treenuts', 'Vegetal Products',
       'Vegetable Oils', 'Vegetables', 'Obesity', 'Confirmed', 'Deaths',
       'Recovered', 'Active', 'Population_x', 'Unit (all except Population)',
       'Year', 'Status', 'Life expectancy ', 'Adult Mortality',
       'percentage expenditure', ' BMI ', 'under-five deaths ', 'Polio',
       'Diphtheria ', ' HIV/AIDS', 'Population_y',
       'Income composition of resources', 'Schooling', 'Health_index'],
      dtype='object')

Missing Values

Need to fill in missing values for Percentage Expenditure, BMI, Polio, Treenuts, Pulses, Vegetables,

In [38]:
merged_df.corr()
Out[38]:
Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals Oilcrops ... percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population_y Income composition of resources Schooling Health_index
Animal Products 1.000000 0.710333 -0.478648 0.469318 -0.022442 -0.101656 0.736385 0.641814 0.029119 -0.413549 ... 0.118921 0.437910 -0.199489 0.398435 0.295216 -0.375809 -0.180596 0.701782 0.629505 0.515481
Animal fats 0.710333 1.000000 -0.416339 0.262446 -0.116403 -0.159390 0.251783 0.351140 -0.194524 -0.329150 ... -0.008903 0.379134 -0.036428 0.289351 0.248572 -0.317849 -0.084778 0.596489 0.563103 0.506550
Cereals - Excluding Beer -0.478648 -0.416339 1.000000 -0.308149 -0.054167 0.009134 -0.299150 -0.272598 0.279798 0.114859 ... -0.028851 -0.444254 0.173847 -0.423549 -0.270659 0.515659 0.060297 -0.632969 -0.571845 -0.444253
Eggs 0.469318 0.262446 -0.308149 1.000000 0.223854 -0.052851 0.263722 0.261878 -0.138007 -0.321104 ... 0.077561 0.232045 -0.109087 0.274823 0.222723 -0.389517 0.057395 0.560508 0.451998 0.458177
Fish, Seafood -0.022442 -0.116403 -0.054167 0.223854 1.000000 0.018571 -0.006628 -0.259120 -0.099272 0.346591 ... -0.073124 -0.113074 -0.062338 -0.010877 -0.042181 -0.077028 0.035187 0.010264 0.001983 -0.068667
Fruits - Excluding Wine -0.101656 -0.159390 0.009134 -0.052851 0.018571 1.000000 -0.024509 -0.047790 0.079720 0.032919 ... 0.006071 -0.065394 -0.037703 0.037670 -0.026199 -0.047501 0.023415 -0.092409 -0.057522 -0.104336
Meat 0.736385 0.251783 -0.299150 0.263722 -0.006628 -0.024509 1.000000 0.163367 0.209309 -0.228055 ... -0.018125 0.273449 -0.263508 0.226785 0.094194 -0.137876 -0.192964 0.405613 0.372530 0.244104
Milk - Excluding Butter 0.641814 0.351140 -0.272598 0.261878 -0.259120 -0.047790 0.163367 1.000000 0.019658 -0.403036 ... 0.321355 0.310507 -0.062141 0.325657 0.315474 -0.312569 -0.122456 0.450697 0.376794 0.343129
Offals 0.029119 -0.194524 0.279798 -0.138007 -0.099272 0.079720 0.209309 0.019658 1.000000 0.027023 ... 0.123583 -0.180217 -0.001941 -0.187155 -0.346686 0.298875 -0.039800 -0.271129 -0.253385 -0.175541
Oilcrops -0.413549 -0.329150 0.114859 -0.321104 0.346591 0.032919 -0.228055 -0.403036 0.027023 1.000000 ... -0.010976 -0.163590 0.038290 -0.264518 -0.202584 0.109738 -0.009143 -0.369430 -0.344473 -0.382370
Pulses -0.419936 -0.311695 0.409971 -0.327025 -0.092894 0.497729 -0.320564 -0.188129 0.071100 0.145602 ... -0.037511 -0.361713 0.280064 -0.110328 -0.119864 0.172618 -0.006194 -0.516880 -0.477461 -0.267498
Spices -0.209950 -0.194603 0.100376 0.011249 0.221450 0.003150 -0.237087 -0.062440 -0.109078 0.127786 ... -0.049728 -0.176725 0.147135 0.015127 0.114354 -0.040463 0.023170 -0.134220 -0.133312 -0.087904
Starchy Roots -0.387591 -0.302344 0.202710 -0.350042 0.172502 0.447195 -0.166073 -0.395585 0.143363 0.304605 ... -0.046320 -0.352279 0.137222 -0.212275 -0.298081 0.295374 0.120540 -0.449930 -0.341704 -0.321883
Stimulants 0.495903 0.296297 -0.270974 0.289669 0.020412 -0.081584 0.290229 0.455033 -0.062976 -0.274782 ... -0.023394 0.242331 -0.204802 0.271028 0.231936 -0.257840 -0.149345 0.474446 0.394878 0.214624
Treenuts 0.188401 0.164782 -0.253826 0.306574 0.201834 -0.098389 -0.038856 0.222835 -0.181652 -0.209238 ... 0.025898 0.238155 -0.088916 0.203553 0.230329 -0.182906 0.020511 0.318894 0.289013 0.258659
Vegetal Products -1.000000 -0.710321 0.478633 -0.469364 0.022396 0.101486 -0.736382 -0.641805 -0.029135 0.413568 ... -0.118908 -0.437913 0.199511 -0.398488 -0.295246 0.375775 0.180667 -0.701788 -0.629521 -0.515501
Vegetable Oils -0.669238 -0.389604 0.023742 -0.211264 -0.240058 -0.072968 -0.548870 -0.385380 -0.175089 -0.243485 ... -0.116374 -0.184612 0.128582 -0.126639 -0.112706 0.157890 0.190102 -0.282703 -0.242394 -0.127352
Vegetables 0.071712 -0.100507 -0.012157 0.183638 -0.018436 0.033612 -0.026554 0.278164 0.036590 -0.126193 ... 0.153100 0.060095 0.054947 0.067792 0.066522 -0.165925 0.090252 0.055665 -0.016862 -0.041928
Obesity 0.435854 0.392369 -0.489569 0.291984 -0.147099 -0.088428 0.290266 0.266737 -0.257126 -0.119119 ... 0.006395 0.780198 -0.327034 0.312374 0.305877 -0.375988 -0.076333 0.687399 0.627812 0.327343
Confirmed 0.385509 0.363431 -0.358845 0.273573 0.066726 -0.053812 0.188795 0.247044 -0.147456 -0.278321 ... -0.040638 0.440685 -0.161320 0.270600 0.173125 -0.229008 -0.014421 0.536300 0.472649 0.456909
Deaths 0.244084 0.340994 -0.296291 0.118489 -0.017931 -0.065005 0.050055 0.160422 -0.159473 -0.193428 ... -0.029572 0.310681 -0.098364 0.176261 0.142232 -0.155151 0.020222 0.399791 0.383512 0.472358
Recovered 0.348826 0.310547 -0.294936 0.169268 -0.002495 -0.044328 0.204846 0.232973 -0.144680 -0.223579 ... -0.025195 0.363214 -0.129881 0.218277 0.163858 -0.189420 -0.008604 0.472593 0.421753 0.355128
Active 0.249087 0.241329 -0.263337 0.303952 0.148323 -0.035477 0.084228 0.144775 -0.066046 -0.218805 ... -0.042826 0.331412 -0.128972 0.213607 0.094970 -0.172642 -0.022490 0.354137 0.298410 0.347919
Population_x 0.002143 0.017692 0.005893 0.130911 -0.005898 -0.040584 0.016784 -0.062844 0.103447 -0.026496 ... -0.021487 -0.124958 0.697445 0.003707 0.013292 -0.043567 0.119405 -0.038034 -0.047860 0.091874
Year NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Life expectancy 0.634044 0.539625 -0.553037 0.565338 0.055552 -0.032489 0.320719 0.442948 -0.312382 -0.333776 ... 0.045271 0.537568 -0.267914 0.534340 0.463969 -0.614532 -0.065091 0.907300 0.818421 0.650942
Adult Mortality -0.432882 -0.369972 0.403518 -0.389155 -0.046743 0.036476 -0.166003 -0.369649 0.306215 0.162789 ... -0.044914 -0.398911 0.206128 -0.412024 -0.291840 0.674491 0.056156 -0.654776 -0.546813 -0.413978
percentage expenditure 0.118921 -0.008903 -0.028851 0.077561 -0.073124 0.006071 -0.018125 0.321355 0.123583 -0.010976 ... 1.000000 0.039545 -0.017123 0.003127 0.039444 -0.040954 -0.021546 0.011676 0.015139 0.051198
BMI 0.437910 0.379134 -0.444254 0.232045 -0.113074 -0.065394 0.273449 0.310507 -0.180217 -0.163590 ... 0.039545 1.000000 -0.243160 0.307705 0.233360 -0.308911 -0.016152 0.628581 0.603645 0.367911
under-five deaths -0.199489 -0.036428 0.173847 -0.109087 -0.062338 -0.037703 -0.263508 -0.062141 -0.001941 0.038290 ... -0.017123 -0.243160 1.000000 -0.161487 -0.151832 0.126638 0.307181 -0.250177 -0.253128 -0.055520
Polio 0.398435 0.289351 -0.423549 0.274823 -0.010877 0.037670 0.226785 0.325657 -0.187155 -0.264518 ... 0.003127 0.307705 -0.161487 1.000000 0.639053 -0.428879 -0.265636 0.500279 0.413611 0.369507
Diphtheria 0.295216 0.248572 -0.270659 0.222723 -0.042181 -0.026199 0.094194 0.315474 -0.346686 -0.202584 ... 0.039444 0.233360 -0.151832 0.639053 1.000000 -0.337005 -0.078368 0.430368 0.388690 0.325217
HIV/AIDS -0.375809 -0.317849 0.515659 -0.389517 -0.077028 -0.047501 -0.137876 -0.312569 0.298875 0.109738 ... -0.040954 -0.308911 0.126638 -0.428879 -0.337005 1.000000 0.042649 -0.497048 -0.396047 -0.302289
Population_y -0.180596 -0.084778 0.060297 0.057395 0.035187 0.023415 -0.192964 -0.122456 -0.039800 -0.009143 ... -0.021546 -0.016152 0.307181 -0.265636 -0.078368 0.042649 1.000000 -0.004567 0.013002 0.135824
Income composition of resources 0.701782 0.596489 -0.632969 0.560508 0.010264 -0.092409 0.405613 0.450697 -0.271129 -0.369430 ... 0.011676 0.628581 -0.250177 0.500279 0.430368 -0.497048 -0.004567 1.000000 0.925844 0.696775
Schooling 0.629505 0.563103 -0.571845 0.451998 0.001983 -0.057522 0.372530 0.376794 -0.253385 -0.344473 ... 0.015139 0.603645 -0.253128 0.413611 0.388690 -0.396047 0.013002 0.925844 1.000000 0.700796
Health_index 0.515481 0.506550 -0.444253 0.458177 -0.068667 -0.104336 0.244104 0.343129 -0.175541 -0.382370 ... 0.051198 0.367911 -0.055520 0.369507 0.325217 -0.302289 0.135824 0.696775 0.700796 1.000000

37 rows × 37 columns

Visual 2: Visualizing Nutritional Relation with Recovery Rate

In [40]:
merged_df['Recovered'].value_counts().plot.bar(title='Frequency Distribution of Recovered')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x123eb2a90>

Visual 3: Nutrition

In [41]:
c = merged_df['Pulses']
d = merged_df['Recovered']
plt.scatter(c,d, color = 'purple')
plt.xlabel('Heavy Pulse Diet')
plt.ylabel("Recovered")
plt.title('High Pulse Diet Related to Recovery Rate')
plt.show()

Visual 4: Nutrition Pairplot of various diets' correlation

In [42]:
nutrition_df1 = merged_df.drop(['Deaths','Obesity',  
                                'under-five deaths ',
                                'Recovered', 'Active',
                                'Country','under-five deaths ', 
                                'Unit (all except Population)', 'Year',
                                'Status',' BMI ','Polio','percentage expenditure',
                                'Confirmed','Diphtheria ', ' HIV/AIDS', 
                                'Population_x', 'Population_y',
                                'Life expectancy ', 'Adult Mortality',
        ], axis=1)
#nutrition_df= nutrition_df.drop(['treenuts_categorical'], axis=1)
 #'Obesity',  'Deaths',
       #'Recovered', 'Active', 'Population_x', 'Unit (all except Population)',
       #'Year', 'Status', 'Life expectancy ', 'Adult Mortality',
       #'percentage expenditure', ' BMI ', 'under-five deaths ', 'Polio',
       #'Diphtheria ', ' HIV/AIDS',


plt.title('Nutritional Pairplotting of Different Diets')
sns.pairplot(nutrition_df1)
Out[42]:
<seaborn.axisgrid.PairGrid at 0x1245b4450>

Decision Tree (Regression)

Target Variable: Recovery rate = 'Recovered' as continuous value

Independents: Series of Nutritional Diets

In [43]:
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score 

#X = merged_df[['percentage expenditure',' BMI ','Polio','Treenuts','Pulses','Meat','Vegetables','Health_index']]
#X = merged_df['Treenuts','Health_index'] #Dropping a bunch to check
merged_df_dropped = merged_df.dropna()
# X = merged_df_dropped[['Treenuts', 'Health_index']]
X = merged_df_dropped.drop(columns=['Deaths', 
                                    'under-five deaths ',
                                    'Recovered', 
                                    'Country', 
                                    'Unit (all except Population)', 
                                    'Status', 
                                    'Confirmed', 
                                    'Population_x', 
                                    'Population_y'])
y = merged_df_dropped['Recovered']

#Split data to train and test with 20% sample 
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=.2,random_state =5)

#regressor = DecisionTreeRegressor()
dtr_model = DecisionTreeRegressor()
dtr_model.fit(X_train, y_train)  
Out[43]:
DecisionTreeRegressor()

Note:

We will identify the most important features in the training set in our Decision Tree Regression model.

In [46]:
print(dtr_model.score(X_train, y_train))
1.0
In [47]:
print(dtr_model.score(X_test, y_test))
-0.6108109683200682
In [70]:
pd.Series(dtr_model.feature_importances_, index=X_train.columns).sort_values(ascending=False)
Out[70]:
Life expectancy                    4.344490e-01
Starchy Roots                      3.403535e-01
Cereals - Excluding Beer           1.021883e-01
Active                             3.891012e-02
Offals                             2.390078e-02
Schooling                          1.769176e-02
Vegetables                         8.377621e-03
Fish, Seafood                      6.602875e-03
Income composition of resources    6.562577e-03
Diphtheria                         5.603951e-03
Spices                             4.725922e-03
Fruits - Excluding Wine            3.783575e-03
Vegetable Oils                     2.641331e-03
Eggs                               2.057723e-03
Treenuts                           1.012967e-03
Milk - Excluding Butter            8.068200e-04
Stimulants                         1.720608e-04
Polio                              5.463217e-05
 BMI                               4.997524e-05
Adult Mortality                    2.979620e-05
Animal Products                    7.596087e-06
Pulses                             6.669819e-06
 HIV/AIDS                          6.386506e-06
Oilcrops                           1.597028e-06
Vegetal Products                   1.546011e-06
Health_index                       9.333074e-07
Meat                               5.404925e-08
Obesity                            0.000000e+00
Year                               0.000000e+00
percentage expenditure             0.000000e+00
Animal fats                        0.000000e+00
dtype: float64

Observation:

Our Decision Tree Regression model overfit the training data with a score of '1', while severely not fitting the test data above with -.01. We will run a second model: Linear Regression model. We will use the same target variable 'Recovered' and the above independent variables.

Linear Regression Model

Note:

We will check the score of our Linear Regression model on our test training set.

In [49]:
line.score(X_train, y_train)
Out[49]:
0.3861260490273343
In [71]:
line.score(X_test, y_test)
Out[71]:
0.20605411393167627
In [50]:
#What do our coefficients say regarding our target ('Regressor')?
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

# We add a constant to the model as it's a best practice
# to do so every time!
X = sm.add_constant(X)

# We fit an OLS model using statsmodels
results = sm.OLS(y, X).fit()

# We print the summary results
print(results.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              Recovered   R-squared:                       0.385
Model:                            OLS   Adj. R-squared:                  0.184
Method:                 Least Squares   F-statistic:                     1.918
Date:                Mon, 15 Jun 2020   Prob (F-statistic):            0.00963
Time:                        18:26:47   Log-Likelihood:                 145.39
No. Observations:                 123   AIC:                            -228.8
Df Residuals:                      92   BIC:                            -141.6
Df Model:                          30                                         
Covariance Type:            nonrobust                                         
===================================================================================================
                                      coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------
Animal Products                    -0.0103      2.351     -0.004      0.997      -4.680       4.659
Animal fats                         2.0768      1.646      1.261      0.210      -1.193       5.347
Cereals - Excluding Beer            0.3624      0.173      2.092      0.039       0.018       0.706
Eggs                                2.0473      1.648      1.242      0.217      -1.226       5.320
Fish, Seafood                       2.0922      1.647      1.270      0.207      -1.179       5.363
Fruits - Excluding Wine             0.3594      0.174      2.064      0.042       0.014       0.705
Meat                                2.0782      1.647      1.262      0.210      -1.192       5.349
Milk - Excluding Butter             2.0744      1.646      1.260      0.211      -1.195       5.344
Offals                              2.0540      1.651      1.244      0.217      -1.225       5.334
Oilcrops                            0.3630      0.172      2.108      0.038       0.021       0.705
Pulses                              0.4072      0.177      2.303      0.024       0.056       0.758
Spices                              0.3491      0.176      1.980      0.051      -0.001       0.699
Starchy Roots                       0.3638      0.168      2.164      0.033       0.030       0.698
Stimulants                          0.3992      0.177      2.259      0.026       0.048       0.750
Treenuts                            0.3615      0.173      2.088      0.040       0.018       0.705
Vegetal Products                    1.7013      2.352      0.723      0.471      -2.970       6.373
Vegetable Oils                      0.3651      0.173      2.111      0.037       0.022       0.709
Vegetables                          0.3540      0.176      2.016      0.047       0.005       0.703
Obesity                            -0.0033      0.002     -1.744      0.085      -0.007       0.000
Active                              0.0716      0.166      0.430      0.668      -0.259       0.402
Year                               -0.0514      0.058     -0.883      0.379      -0.167       0.064
Life expectancy                     0.0041      0.003      1.205      0.231      -0.003       0.011
Adult Mortality                  -6.06e-05      0.000     -0.380      0.705      -0.000       0.000
percentage expenditure           1.844e-05      0.000      0.069      0.945      -0.001       0.001
 BMI                                0.0010      0.001      1.634      0.106      -0.000       0.002
Polio                              -0.0005      0.001     -0.932      0.354      -0.002       0.001
Diphtheria                       5.715e-05      0.001      0.108      0.915      -0.001       0.001
 HIV/AIDS                           0.0099      0.009      1.041      0.301      -0.009       0.029
Income composition of resources     0.3613      0.240      1.506      0.135      -0.115       0.838
Schooling                          -0.0069      0.008     -0.819      0.415      -0.024       0.010
Health_index                       -0.0011      0.001     -1.050      0.297      -0.003       0.001
==============================================================================
Omnibus:                       85.237   Durbin-Watson:                   2.148
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              528.659
Skew:                           2.408   Prob(JB):                    1.60e-115
Kurtosis:                      11.941   Cond. No.                     1.10e+06
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.1e+06. This might indicate that there are
strong multicollinearity or other numerical problems.

Observations

We see that the score is much lower on our test set. This means that our second model also overfit the data to the training set.

Also, below, We see that our independent variable of interest "Tree Nuts" did not rank in the top 10. This means it does not have as much significance as we hypothesized.

The top ten nutritional features of importance (independent variables) are as follows for the Linear Regression Model: 1) 'Starchy Roots', 2)'Cereals - Excluding Beer', 3) 'Offals', 4) 'Vegetables', 5) 'Fish, Seafood', 6) 'Animal fats', 7) 'Spices', 8) 'Fruits - Excluding Wine', 9)'Vegetable Oils', 10)'Eggs'. 'Treenuts' did not rank high in the Decision Tree Regression model's features of importance.

However, in reviewing our Ordinary Least Squares model (Linear Regression) our'Treenuts' coefficient regressing on the target variable, 'Recovery', we calculated 0.3615 with a p-value of .04--which is pretty strong.

Because our two models (DTR and Linear Regression model) overfit our training data, we will use the Random Forest model (RFM) in addition to tuning our parameters. We selected RFM because it is ideal for larger data and estimates missing data while running a series of decision trees with different combinations of our features influencing (or not influencing) our target variable: COVID-19 'Recovery' rate. For example, we also dropped these features: 'Death', 'under-age five' deaths, and 'Confirmed ' because they are not in the sample of infected with COVID-19.

Random Forest Regression Model

We will Uise GridSearchCV tool to tune parameters to help with overfitting in our final model.

In [52]:
from sklearn.ensemble import RandomForestRegressor

rfr = RandomForestRegressor(max_depth=2, n_estimators=1000)

rfr.fit(X_train, y_train)

print(rfr.score(X_train, y_train))
print(rfr.score(X_test, y_test))
0.7639240986409821
0.25121836550692256
In [53]:
from sklearn.model_selection import GridSearchCV

params = {
    'n_estimators': [50,100,200,500],
    'criterion': ['mse', 'mae'],
    'max_depth': [2,3,None],
    'min_samples_split': [1,2,3],
    'min_samples_leaf': [1,2,4,6]
}

random_forest_grid = GridSearchCV(RandomForestRegressor(), param_grid=params, verbose=1, cv=3)
random_forest_grid.fit(X_train, y_train)
Fitting 3 folds for each of 288 candidates, totalling 864 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 864 out of 864 | elapsed:  5.7min finished
Out[53]:
GridSearchCV(cv=3, estimator=RandomForestRegressor(),
             param_grid={'criterion': ['mse', 'mae'], 'max_depth': [2, 3, None],
                         'min_samples_leaf': [1, 2, 4, 6],
                         'min_samples_split': [1, 2, 3],
                         'n_estimators': [50, 100, 200, 500]},
             verbose=1)
In [54]:
random_forest_grid.best_params_
Out[54]:
{'criterion': 'mse',
 'max_depth': 3,
 'min_samples_leaf': 6,
 'min_samples_split': 2,
 'n_estimators': 50}
In [55]:
random_forest_grid.best_score_
Out[55]:
0.3370534915128684
In [56]:
better_rfr = RandomForestRegressor(**random_forest_grid.best_params_)

better_rfr.fit(X_train, y_train)

print(better_rfr.score(X_train, y_train))
print(better_rfr.score(X_test, y_test))
0.5116605901627048
0.31966375847390704

Feature Importance

In [57]:
pd.Series(better_rfr.feature_importances_, index=X_train.columns).sort_values(ascending=False)
Out[57]:
Life expectancy                    0.527670
Income composition of resources    0.221833
Active                             0.054493
Cereals - Excluding Beer           0.052489
Animal Products                    0.036105
Pulses                             0.020531
Vegetal Products                   0.015512
Fish, Seafood                      0.015260
Animal fats                        0.012240
Stimulants                         0.010632
Vegetable Oils                     0.008349
Starchy Roots                      0.006505
Milk - Excluding Butter            0.005257
Polio                              0.004526
Offals                             0.001686
Fruits - Excluding Wine            0.001521
Health_index                       0.001457
 BMI                               0.001361
Diphtheria                         0.000926
Vegetables                         0.000578
Treenuts                           0.000389
Schooling                          0.000368
Oilcrops                           0.000313
Eggs                               0.000000
 HIV/AIDS                          0.000000
percentage expenditure             0.000000
Meat                               0.000000
Adult Mortality                    0.000000
Obesity                            0.000000
Year                               0.000000
Spices                             0.000000
dtype: float64

Visual 5: Bar Plot of Features Importance Among Selected Factors

In [58]:
(pd.Series(better_rfr.feature_importances_, index=X_train.columns)*100).sort_values(ascending=True).plot(kind='barh', title='Feature Importance Among Selected Factors (in %) - Higher is Better')
Out[58]:
<matplotlib.axes._subplots.AxesSubplot at 0x141399750>

Observations

In the Random Forest Model, our top ten features of importance do not list 'Treenuts'. We see that our independent variable of interest "Tree Nuts" did not rank in the top 10. This means it does not have as much significance as we hypothesized.

The top 10 dietary items include: 1) 'Life expectancy', 2) 'Pulses', 'Deaths', 'Cereals', 5) 'Schooling', 6) 'Animal fats', 7) 'Stimulants', 8) 'Milk - Excluding Butter', 9) 'Vegetables', and 10)'Offals'.

Note:

Thressholds: Organizing Numerical Values in Diet into Categorical for Comparison

We will set a thresshold of 'Recovery' to 10 percent to see countries that fall into that classifying group.

In [65]:
merged_df[merged_df['Recovered'] <= 0.1]
Out[65]:
Country Animal Products Animal fats Cereals - Excluding Beer Eggs Fish, Seafood Fruits - Excluding Wine Meat Milk - Excluding Butter Offals ... percentage expenditure BMI under-five deaths Polio Diphtheria HIV/AIDS Population_y Income composition of resources Schooling Health_index
0 Afghanistan 21.6397 6.2224 8.0353 0.6859 0.0327 0.4246 6.1244 8.2803 0.3103 ... 71.279624 19.1 83 6.0 65.0 0.1 33736494.0 0.479 10.1 32.3
1 Albania 32.0002 3.4172 2.6734 1.6448 0.1445 0.6418 8.7428 17.7576 0.2933 ... 364.975229 58.0 0 99.0 99.0 0.1 28873.0 0.762 14.2 52.9
2 Algeria 14.4175 0.8972 4.2035 1.2171 0.2008 0.5772 3.8961 8.0934 0.1067 ... 0.000000 59.5 24 95.0 95.0 0.1 39871528.0 0.743 14.4 23.6
3 Angola 15.3041 1.3130 6.5545 0.1539 1.4155 0.3488 11.0268 1.2309 0.1539 ... 0.000000 23.3 98 7.0 64.0 1.9 2785935.0 0.531 11.4 25.2
4 Antigua and Barbuda 27.7033 4.6686 3.2153 0.3872 1.5263 1.2177 14.3202 6.6607 0.1347 ... 0.000000 47.7 0 86.0 99.0 0.2 NaN 0.784 13.9 29.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
139 Uruguay 25.5069 3.4811 2.5698 1.2804 0.3281 0.1777 12.2841 8.0603 0.0729 ... 0.000000 64.0 0 95.0 95.0 0.1 3431552.0 0.794 15.5 41.3
140 Uzbekistan 25.9903 2.4884 2.7168 1.0639 0.0962 0.5830 10.3624 11.8050 0.1743 ... 0.000000 44.7 17 99.0 99.0 0.1 312989.0 0.697 12.1 34.3
142 Yemen 12.5401 2.0131 11.5271 0.5514 0.3847 0.2564 8.0010 1.3463 0.2436 ... 0.000000 41.3 47 63.0 69.0 0.1 NaN 0.499 9.0 18.5
143 Zambia 9.6005 1.6113 14.3225 0.6266 1.0070 0.1343 4.9010 1.2756 0.1790 ... 0.000000 23.4 40 9.0 9.0 4.1 161587.0 0.576 12.5 28.7
144 Zimbabwe 10.3796 2.9543 9.7922 0.3682 0.2455 0.0614 4.5674 2.1040 0.1315 ... 0.000000 31.8 32 88.0 87.0 6.2 15777451.0 0.507 10.3 38.2

119 rows × 40 columns

Sorting Eastern Mediterranean Countries

We are curious about a region with a high 'Treenuts' content in their daily diet. So we selected the Eastern Mediterranean region from the World Health Organization grouping.

Note:

We will set a thresshold of 'Treenut' to .015 (1.5 percent daily intake) to see countries that fall into that classifying group of consuming tree nuts and level of incorporating.

Visual 6: Visualizing 'Recovery Rate Across Eastern Mediterranean Countries'

Hardest Hit Countries Comparison

Goal: To plot countries' recovery rates w/r/t nutritional categories--subclass plotting | Plot counts for each country's recovery rate.

In [67]:
#Inserting top ranked countries w/highest number of COVID-19 cases (Italy, US, and Brazil(more recently))
chart = sns.catplot(x="Country", y="Recovered", hue='treenuts_categorical', kind="bar", 
                                            data=merged_df[merged_df['Country'].isin(['Italy',
                                            'United States', 'Brazil','Iran','Lebanon', 'Afghanistan','Kuwait', 
                                            'Pakistan', 'Saudi Arabia','Jordan','Syria','Yemen','Egypt',
                                            'United Arab Emirates','Oman','Bahrain','Qatar','Morocco','Libya',
                                            'Tunisia','Iraq'])]);
chart.set_xticklabels(rotation=45, horizontalalignment='right')
plt.title('Recovery Rate Across Eastern Mediterranean Countries Regarding Treenuts Diet')
Out[67]:
Text(0.5, 1, 'Recovery Rate Across Eastern Mediterranean Countries Regarding Treenuts Diet')

Results:

After running three models and tuning our parameters, we improved both our training and test set scores. For example, our train score of 0.512 improved from 0.3861 in our LR model. Meanwhile, our Random Forest Regressor model's test of 0.386 improved upon our LR model's lower test score of 0.206 after parameter tuning.

Our Random Forest Model dealt with the entropy, or noise, that is introduced when many features are included in a model to measure our target of COVID-19 recovery rate as 'Recovered'. Many factors play a role. Surprisingly, the Health Score Index did not carry the majority of the explanatory power. As seen in our first visual, the scatterplot did not show a one to one correlation. Countries with a medium Health Score Index (HSI)had higher COVID-19 recovery rates compared to some countries with a higher HSI. At this point, we considered additional factors beyond state investement, like nutrtition and regional dietary choices. We selected a region, Eastern Mediterranean, to review their nutritional impact on 'Recovered' rate by considering how nutrition may build immunity to stave off viruses. The Eastern Mediterranean region emerged as a choice because they experienced the Middle East Respiratory Syndrome in 2015.

Subpoints to note show that the Eastern Mediterranean countries included (Afghanistan, Egypt, Iraq, Jordan, Kuwait, Lebanon, Morocco, Oman, Pakistan, Saudi Arabia, Tunisia, United Arab Emirates, and Yemen) present a range between .076 (Yemen) to 3.82 (UAE) regarding 'Treenuts' composite in diet. This range may be attributed to income per capita. Specifically, the lowest end of spectrum is Yemen, a lower-income country, in contrast to the highest end of spectrum UAE, a high-income country. We would need to do a comparative analysis to see if the same trend occurs with this diet in another region and see if income disproportionally affects the consumption of tree nuts.

Gaps for Further Study:

Upon reflection, we could have controlled for income and its access to nutrition. More specifically, nutrition includes higher-priced food items that are more expensive to transport from mountainous to desert regions. If we were to run a regression model on nutritional items and price, consuming 'Tree nuts' (almonds, pine nuts, hazlenuts, pistachios and walnuts) would present a premium.

To better test the regional nutritional theory positively influencing the COVID-19 recovery rate, we should have sampled the top 10 percent of the population in each Eastern Mediterranean country. Then we would have been better compare the access to high-priced food items, like 'Tree nuts' category, which are more readily available and prepared in higher-priced dishes, like desserts and meat-based rice. These are dishes that populations below the poverty line cannot afford, and thereby cannot easily incorporate into their daily diets to build immunity and ultimately lead to likely recovery--if infected by COVID-19.

Additionally, we can supplement our nutritional information by including total protein contents across nutritional categories and combine with features covering percentages of energy (in kilo calories) consumed from each type of food listed. Then, we may conduct a Principal Component Analysis across the subcategories of features across nutrition to give a more precise slice of nutrition per region across the upper 10 percent of the population.